Question 5 - Databricks Machine Learning Associate Real Exam Questions [Feb 2026 Update]

Q: 5

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult. Which of the following describes why?

Options

Correct Answer:

Explanation

Gradient Boosting is an ensemble learning technique that builds models in a sequential, stage-wise fashion. Each new model (typically a decision tree) is trained to correct the errors or residuals of the ensemble of all previously trained models. This means the construction of the second tree depends entirely on the outcome of the first tree, the third tree depends on the first two, and so on. This inherent sequential dependency, where each iteration requires the output from the previous one, is the fundamental reason why the overall tree-building process cannot be easily parallelized.

Why Incorrect

A. Parallelization is a general computing concept not restricted to linear algebra; many non-linear algorithms can be parallelized.

B. Many parallel algorithms work by distributing data across nodes; needing access to all data does not inherently prevent parallelization.

C. The calculation of gradients across data points can often be parallelized; this is not the primary bottleneck for the overall algorithm.

References

1. Official Vendor Documentation (Apache Spark

the foundation of Databricks ML): In the MLlib programming guide for Gradient-Boosted Trees (GBTs)

the documentation states: "GBTs train decision trees one by one

where each new tree helps to correct the errors of the previously trained ensemble of trees. The training of each tree is dependent on the previously trained trees." This highlights the sequential nature of the algorithm.

Source: Apache Spark 3.5.0 MLlib Guide

"Classification and regression - Gradient-boosted trees (GBTs)

" Algorithm section.

2. Academic Publication: The canonical textbook "The Elements of Statistical Learning" describes the gradient boosting algorithm as a forward stagewise procedure. The algorithm is presented as a loop from m=1 to M

where each step m explicitly uses the model f{m-1}(x) from the preceding step to compute the residuals and fit the next base learner hm(x). This iterative dependency is fundamental to the method.

Source: Hastie

Tibshirani

& Friedman

J. (2009). The Elements of Statistical Learning: Data Mining

Inference

and Prediction. Springer. Chapter 10

"Boosting and Additive Trees

" Algorithm 10.3

page 359.

3. Academic Publication: The paper introducing XGBoost

a highly optimized gradient boosting implementation

discusses this challenge. It explains that while the inter-tree creation is sequential

they achieve scalability by parallelizing the process within each tree's construction

specifically the split-finding part. This distinction confirms that the overall boosting process is iterative and not parallelizable across trees.

Source: Chen

& Guestrin

C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Section 2.1

"Regularized Learning Objective." DOI: https://doi.org/10.1145/2939672.2939785

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE