Question 10

Question

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-
managed notebooks. You use BigQuery to split your data into training and validation sets using the
following queries:
CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.8);
CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.2);
After training the model, you achieve an area under the receiver operating characteristic curve (AUC
ROC) value of 0.8, but after deploying the model to production, you notice that your model
performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Accepted Answer

The tables that you created to hold your training and validation records share some records, and
you may not be using all the data in your initial table.

CaseyU · Answer

Wait, but doesn’t the RAND() approach here mean some records show up in both training and validation tables, not necessarily every record? Feels like partial overlap (option C) is the bigger issue, especially since D would only ever happen if you got super unlucky with a tiny dataset. Am I missing something?

Riley J. · Answer

C . D is a trap, it's not every record that's duplicated, it's just some overlap because RAND() is used twice per row. I've seen this come up on similar questions.

Casey Z. · Answer

C

Ajay S. · Answer

D here. Since RAND() < 0.2 runs separately, it's possible (though rare) for every record to satisfy both conditions and end up in both tables, especially if the dataset is tiny or badly randomized. Not totally sure, open to other takes.

Liam K. · Answer

D , since if a row gets RAND() < 0.2 both times, it's in both sets for sure. So technically possible every record lands in both if you're really unlucky, especially with small tables. Not totally confident though, might be missing something about typical overlap rates.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE