1. Databricks Documentation
"What is pandas API on Spark?": "Pandas API on Spark makes data scientists productive with big data
by allowing them to use the pandas API they are familiar with to work with terabyte-scale datasets... with minimal code change."
Source: Databricks Documentation > Apache Spark > Pandas API on Spark > Overview.
2. Apache Spark Documentation
"Pandas API on Spark": "It provides pandas-equivalent APIs that work on Apache Spark. Pandas API on Spark is useful for users who are already familiar with pandas and want to leverage Spark for big data."
Source: Apache Spark™ 3.5.0 Documentation > PySpark > Pandas API on Spark.
3. Databricks Blog
"Koalas: Easy Transition from pandas to Apache Spark": "The Koalas project makes data scientists more productive when interacting with big data
by implementing the pandas DataFrame API on top of Apache Spark... Data scientists can now make a seamless transition from a single machine to a distributed environment."
Source: Databricks Blog
January 24
2020
"Koalas: Easy Transition from pandas to Apache Spark".