1. Official Vendor Documentation (Apache Spark
the foundation of Databricks ML): In the MLlib programming guide for Gradient-Boosted Trees (GBTs)
the documentation states: "GBTs train decision trees one by one
where each new tree helps to correct the errors of the previously trained ensemble of trees. The training of each tree is dependent on the previously trained trees." This highlights the sequential nature of the algorithm.
Source: Apache Spark 3.5.0 MLlib Guide
"Classification and regression - Gradient-boosted trees (GBTs)
" Algorithm section.
2. Academic Publication: The canonical textbook "The Elements of Statistical Learning" describes the gradient boosting algorithm as a forward stagewise procedure. The algorithm is presented as a loop from m=1 to M
where each step m explicitly uses the model f{m-1}(x) from the preceding step to compute the residuals and fit the next base learner hm(x). This iterative dependency is fundamental to the method.
Source: Hastie
T.
Tibshirani
R.
& Friedman
J. (2009). The Elements of Statistical Learning: Data Mining
Inference
and Prediction. Springer. Chapter 10
"Boosting and Additive Trees
" Algorithm 10.3
page 359.
3. Academic Publication: The paper introducing XGBoost
a highly optimized gradient boosting implementation
discusses this challenge. It explains that while the inter-tree creation is sequential
they achieve scalability by parallelizing the process within each tree's construction
specifically the split-finding part. This distinction confirms that the overall boosting process is iterative and not parallelizable across trees.
Source: Chen
T.
& Guestrin
C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Section 2.1
"Regularized Learning Objective." DOI: https://doi.org/10.1145/2939672.2939785