1. Apache Spark Documentation
Pandas API on Spark
Internals: "Internally
pandas API on Spark DataFrames are composed of a Spark DataFrame and an 'internal frame'. The internal frame holds the information about index and column labels to map the pandas-like API to the Spark DataFrame." This directly supports that it is made up of a Spark DataFrame and additional metadata.
Source: Apache Spark 3.5.1 Documentation
Pandas API on Spark
Internals section.
2. Databricks Documentation
Pandas API on Spark: "The pandas API on Spark provides pandas-equivalent APIs that work on Apache Spark... You can create a pandas API on Spark DataFrame by calling pyspark.pandas.frompandas or pyspark.pandas.readcsv. You can also convert to and from pandas API on Spark DataFrames and PySpark DataFrames..." This demonstrates the direct relationship and interoperability
refuting that they are unrelated (E) and confirming they are built upon Spark's foundation.
Source: Databricks Documentation > Develop on Databricks > Libraries and scripts > Pandas API on Spark.
3. Learning Spark
2nd Edition (O'Reilly)
Chapter 11: Pandas API on Spark: "The pandas API on Spark was created to provide a pandas-like API on top of Spark
so that data scientists can make an easy transition from a single-node machine to a distributed environment... Under the hood
every pandas API on Spark DataFrame is backed by a PySpark DataFrame."
Source: Chambers
B.
& Zaharia
M. (2020). Learning Spark
2nd Edition. O'Reilly Media
Inc. Chapter 11
"What Is the pandas API on Spark?" section.