1. Apache Spark Official Documentation - Glossary:
Job: "A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save
collect)..."
Stage: "Each job gets divided into smaller sets of tasks called stages that depend on each other..."
Task: "A unit of work that will be sent to one executor."
Source: Apache Spark 3.4.1 Documentation
"Glossary".
2. Databricks Documentation - Spark UI - Jobs Tab:
"The Jobs tab displays a summary of all jobs in the Spark application... The job detail page shows a visualization of the DAG. In the DAG
vertices represent the RDDs or DataFrames and the edges represent the operations to be applied... The DAG is also organized into stages." This documentation visually and textually confirms that jobs are broken into stages
which in turn consist of tasks.
Source: Databricks Documentation
"Spark UI - Jobs tab".
3. Learning Spark
2nd Edition (by Databricks employees):
Chapter 13
"How Spark Executes a Program
" page 304: "When the driver runs
it converts the user’s program into units of physical execution called tasks. Each task is a combination of a chunk of data and a computation to be performed on that chunk. All of this is orchestrated by the driver
which launches tasks on the cluster. A set of tasks is called a stage
and a set of stages is called a job."