1. Apache Spark 3.5.0 Documentation
pyspark.sql.DataFrame.summary: "Computes specified statistics for numeric and string columns... If no statistics are given
this function computes count
mean
stddev
min
approximate quartiles (25%
50%
75%)
and max." This confirms that summary() provides the necessary quartiles for IQR.
Source: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.summary.html
2. Apache Spark 3.5.0 Documentation
pyspark.sql.DataFrame.describe: "Computes basic statistics for numeric and string columns... For numeric columns
the result includes count
mean
stddev
min
max." This source verifies that .describe() lacks the required percentile information.
Source: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.describe.html
3. Databricks Documentation
DataFrames: "You can use describe to see summary statistics for a DataFrame... To see more statistics
including quartiles
use the summary method." This official Databricks source explicitly differentiates between the two methods and highlights summary() for calculating quartiles.
Source: Databricks Documentation > Get started > DataFrames > Python > Summarize and visualize data. (Specific page URLs change
but the content is consistently found in the introductory DataFrame tutorials).