1. Apache Spark™ 3.5.1 Documentation
Pandas API on Spark
Overview: "The pandas API on Spark fills this gap by providing pandas-equivalent APIs that work with Apache Spark. pandas API on Spark is useful not only for pandas users but also PySpark users
because the pandas API on Spark supports many tasks that are difficult to do with PySpark". This highlights the goal of providing pandas features on the scalable Spark engine.
2. Databricks Documentation
Pandas API on Spark: "The Koalas project
now the pandas API on Spark
makes data scientists more productive when interacting with big data
by implementing the pandas DataFrame API on top of Apache Spark." This directly states the purpose is to implement the pandas API on top of Spark for big data productivity.
3. Learning Spark
2nd Edition
Chapter 11: The Pandas API on Spark: "The key goal of the pandas API on Spark is to provide a familiar API for data scientists and engineers already comfortable with pandas to leverage the power of the distributed Spark engine for big data." This confirms the combination of a familiar API with Spark's distributed power.