1. Apache Spark Official Documentation
RDD Programming Guide: In the "Lazy Evaluation" section
it states
"All transformations in Spark are lazy
in that they do not compute their results right away. Instead
they just remember the transformations applied to some base dataset... The transformations are only computed when an action requires a result to be returned to the driver program."
Source: Apache Spark Documentation
RDD Programming Guide
Section: Lazy Evaluation.
2. Databricks Documentation
Introduction to Apache Spark: "Apache Spark uses lazy evaluation for transformations. Transformations are lazy operations
meaning that they are not executed until an action is called. This allows Spark to optimize the query plan by pipelining transformations."
Source: Databricks Documentation
"Developer tools
languages
and APIs"
"Introduction to Apache Spark".
3. Zaharia
M.
et al. (2010). Spark: Cluster Computing with Working Sets. This foundational academic paper on Spark states: "RDDs support two types of operations: transformations
which create a new dataset from an existing one
and actions
which return a value to the driver program after running a computation on the dataset. [...] All transformations in Spark are lazy
in that they do not compute their results right away."
Source: Zaharia
M.
et al. (2010). Spark: Cluster Computing with Working Sets. USENIX HotCloud'10
Page 2
Section 3.1 RDD Operations.