1. dbt Labs Official Documentation (SQL Style Guide): The official dbt style guide explicitly recommends this for performance reasons.
Source: dbt Labs
"SQL Style Guide"
Reference: In the "Be specific" section
the guide states: "Prefer union all to union. The union command performs a union all then a distinct. If you don't need to remove duplicate records
union all is much more performant."
Link: https://docs.getdbt.com/guides/best-practices/how-we-style-our-dbt-projects/3-be-specific#prefer-union-all-to-union
2. dbt Labs Official Documentation (dbt-utils package): The widely used unionrelations macro within the dbt-utils package defaults to using UNION ALL
reinforcing this as a best practice for performance.
Source: dbt Labs
dbt-utils Package Documentation
unionrelations
Reference: The macro is designed to "stack a list of relations on top of each other" and by default
it generates a series of select ... from ... union all statements. This design choice prioritizes performance.
Link: https://github.com/dbt-labs/dbt-utils#unionrelations-source
3. Vendor Documentation (Google Cloud for BigQuery): The documentation for major data warehouses
which execute the SQL generated by dbt
confirms this performance principle.
Source: Google Cloud
"BigQuery optimized SQL patterns: part 1"
Reference: In the section on "Set operators"
the documentation advises: "If your data doesn't have duplicates or you don't need to eliminate them
use UNION ALL for better performance. BigQuery must do extra work to remove duplicate rows for UNION DISTINCT (or UNION)."
Link: https://cloud.google.com/blog/products/bigquery/bigquery-optimized-sql-patterns-part-1
4. Vendor Documentation (Snowflake): Snowflake's documentation also highlights the performance advantage of UNION ALL.
Source: Snowflake Documentation
"UNION & UNION ALL"
Reference: The documentation notes: "UNION ALL is much faster than UNION because UNION ALL does not search for and remove duplicate rows."
Link: https://docs.snowflake.com/en/sql-reference/operators-set#union-union-all