View Professional Machine Learning Engineer Exam Questions

Q: 1

You work as an analyst at a large banking firm. You are developing a robust, scalable ML pipeline to train several regression and classification models. Your primary focus for the pipeline is model interpretability. You want to productionize the pipeline as quickly as possible What should you do?

Options

Discussion

Maya T. Mar 4, 2026 3:25 am

Option D

Zoe Z. Feb 18, 2026 9:39 am

Option D

SteadyCandidate8866 Mar 3, 2026 4:42 am

D . Composer lets you set up custom deep learning pipelines, which is essential when interpretability is the main priority. If production speed was the only factor, A might've worked, but here D fits better.

Logan U. Mar 2, 2026 7:19 am

D . Cloud Composer gives you full pipeline orchestration and works well with custom deep learning models, plus you can easily plug in interpretability tools. The exam guide really pushes Composer for production ML workflows. If speed was the only thing maybe A/C, but interpretability + quick prod needs Composer tbh.

Luke P. Feb 22, 2026 12:39 pm

C/D? I get the rush-to-production angle in A, but D lets you fully customize for interpretability, especially when regulators want explanations. Easy to miss how A hides too much model logic. I think D is correct here, happy to hear if someone disagrees.

Casey C. Feb 22, 2026 7:44 pm

A or D
I'd actually go A for speed since Tabular Workflow is more managed, which usually helps with quick deployment, plus you can still add some interpretability. D is flexible but might take longer to set up. Open to being wrong though!

ParkerK Feb 15, 2026 1:01 pm

D imo, but a little unsure since A looks tempting for quick rollout. Still, Composer (D) covers custom pipeline needs and lets you add interpretability hooks, which the question stresses. Could see A if speed totally trumped explainability though.

Ethan Feb 18, 2026 10:33 pm

I don't think it's B. D is better here since Cloud Composer allows you to build flexible, custom pipelines and plug in interpretability steps, which is called out on the exam blueprint. B (GKE plus XGBoost) might scale but doesn't naturally help with interpretability. Anyone see a reason A would fit better? Pretty sure D matches what real banking ML teams do when explainability matters.

Grace U. Feb 19, 2026 3:13 am

B tbh, GKE with XGBoost custom training sounds scalable and lets you fine-tune stuff, which I thought is good for productionizing quickly. Plus XGBoost gives some model interpretability (like feature importance). Not 100% sure though, D might be more purpose-built for orchestration. Anyone see issues with B?

Kevin T. Feb 27, 2026 8:15 pm

Ugh, Google loves overcomplicating this but it's definitely D.

Be respectful. No spam.

Q: 2

You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?

Options

Discussion

Sam Feb 17, 2026 8:47 am

Option C. had something like this in a mock and C covered the tracking and comparison parts.

Leo O. Mar 1, 2026 1:11 am

Option C is right here. ML Metadata logs the artifacts, Experiments helps with model version comparisons, and TensorBoard shows metrics per epoch. Pretty sure this trio covers exactly what's needed for tracking and evaluation.

Leo E. Feb 21, 2026 2:53 pm

Yeah C is right. You want to track lineage and compare models, so ML Metadata and Experiments handle the tracking, while TensorBoard visualizes the metrics per epoch. Vizier (like in B) would only be needed if they explicitly wanted hyperparam tuning, which isn't mentioned here. Pretty sure about this-correct me if you see it differently.

Quinn Z. Mar 3, 2026 6:14 pm

Maybe C, fits because Metadata handles tracking, Experiments compares runs, and TensorBoard gives per-epoch metrics. B's Vizier is for tuning.

Reese Feb 20, 2026 3:15 am

B , saw something similar in an exam report.

Quinn T. Feb 17, 2026 1:02 pm

Wouldn't Vizier (in B) only be needed if we had to automate tuning? This question seems focused more on tracking and comparing, not actual hyperparameter search.

Priya D. Feb 17, 2026 5:24 am

Probably C. Vizier (in B) is for hyperparameter tuning, but the question's focus is more about tracking parameters, metrics per epoch, and comparing runs-not active tuning. ML Metadata, Experiments, and TensorBoard together cover all those logging and comparison needs pretty directly. Open to other takes if anyone thinks Vizier fits better.

Riley V. Feb 18, 2026 3:19 pm

Why not B? Vizier would help with finding the best model, or does the question care more about logging runs than tuning?

Vikram M. Mar 1, 2026 2:43 pm

Best tools for tracking training parameters and comparing model metrics are in C.

Nina Feb 15, 2026 4:03 am

C tbh, pretty sure I saw a similar question in some exam dumps and C matches what they want for tracking and comparing model metrics. Makes sense if they're not asking about hyperparameter tuning.

Be respectful. No spam.

Q: 3

You developed a custom model by using Vertex Al to forecast the sales of your company s products based on historical transactional data You anticipate changes in the feature distributions and the correlations between the features in the near future You also expect to receive a large volume of prediction requests You plan to use Vertex Al Model Monitoring for drift detection and you want to minimize the cost. What should you do?

Options

Discussion

Piya O. Feb 16, 2026 2:44 pm

Option D, but wow Google sure makes you jump through hoops for config tuning. Sampling rate near 0 keeps the bills reasonable even if you have a ton of data, and attributions still give proper drift signals. If budget is the main concern, pretty sure this is it.

Quinn Mar 1, 2026 5:33 pm

Makes sense to me, D. Lower sampling rate is the biggest saver for cost with lots of requests.

Piya E. Feb 18, 2026 7:14 pm

D imo, since sampling rate close to 0 actually cuts storage way more than lowering monitoring frequency, especially with feature attributions on.

Sara Mar 3, 2026 4:53 pm

D , because using both features and attributions gives you comprehensive drift checks, and keeping the sampling rate low directly cuts down on monitoring costs when traffic is high. If minimizing cost is the main aim this combo makes sense. Pretty sure that's what Google expects here but open to other views.

Chris B. Feb 28, 2026 2:18 am

D fits, since lowering the sampling rate keeps storage and compute way down when volume is high, and adding feature attributions means you don’t lose drift insights. Saw a similar item in exam reports. Pretty sure this is right but I’d listen if anyone disagrees.

Parker N. Mar 1, 2026 12:10 pm

D or maybe C, but leaning D since sampling rate close to 0 cuts down storage and compute a lot. Including feature attributions helps with drift detection too. Not totally sure, but low sampling should minimize cost the most. Anyone disagree?

Jamie G. Mar 1, 2026 12:31 pm

B tbh, closer to 1 means you catch more drift with all that data. With just the features, feels simpler and might still save cost if attributions aren't critical. Trap could be over-monitoring though.

Luke Feb 20, 2026 2:31 am

Probably D. You want both features and attributions for more insight into drift, but keeping that prediction-sampling-rate down (closer to 0) is what really helps minimize cost as usage scales. Pretty sure that's the balanced approach if you want visibility but need to watch spend. Disagree?

Kevin A. Feb 24, 2026 12:43 pm

B seems like a good pick because setting the prediction-sampling-rate closer to 1 should catch more drift cases with lots of data, which matters if you expect major changes. Plus, only monitoring features feels simpler and reduces complexity so might save costs too. Maybe missing something about feature attributions though-open to being corrected.

Be respectful. No spam.

Correct Answer:

Explanation

The best option for using Vertex AI Model Monitoring for drift detection and minimizing the cost is to

use the features and the feature attributions for monitoring, and set a prediction-sampling-rate value

that is closer to 0 than 1. This option allows you to leverage the power and flexibility of Google Cloud

to detect feature drift in the input predict requests for custom models, and reduce the storage and

computation costs of the model monitoring job. Vertex AI Model Monitoring is a service that can

track and compare the results of multiple machine learning runs. Vertex AI Model Monitoring can

monitor the model’s prediction input data for feature skew and drift. Feature drift occurs when the

feature data distribution in production changes over time. If the original training data is not available,

you can enable drift detection to monitor your models for feature drift. Vertex AI Model Monitoring

uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each

feature, and compares them with a baseline distribution. The baseline distribution is the statistical

distribution of the feature’s values in the training data. If the training data is not available, the

baseline distribution is calculated from the first 1000 prediction requests that the model receives. If

the distance score for a feature exceeds an alerting threshold that you set, Vertex AI Model

Monitoring sends you an email alert. However, if you use a custom model, you can also enable

feature attribution monitoring, which can provide more insights into the feature drift. Feature

attribution monitoring analyzes the feature attributions, which are the contributions of each feature

to the prediction output. Feature attribution monitoring can help you identify the features that have

the most impact on the model performance, and the features that have the most significant drift

over time. Feature attribution monitoring can also help you understand the relationship between the

features and the prediction output, and the correlation between the features1. The prediction-

sampling-rate is a parameter that determines the percentage of prediction requests that are logged

and analyzed by the model monitoring job. Using a lower prediction-sampling-rate can reduce the

storage and computation costs of the model monitoring job, but also the quality and validity of the

data. Using a lower prediction-sampling-rate can introduce sampling bias and noise into the data,

and make the model monitoring job miss some important features or patterns of the data. However,

using a higher prediction-sampling-rate can increase the storage and computation costs of the model

monitoring job, and also the amount of data that needs to be processed and analyzed. Therefore,

there is a trade-off between the prediction-sampling-rate and the cost and accuracy of the model

monitoring job, and the optimal prediction-sampling-rate depends on the business objective and the

data characteristics2. By using the features and the feature attributions for monitoring, and setting a

prediction-sampling-rate value that is closer to 0 than 1, you can use Vertex AI Model Monitoring for

drift detection and minimize the cost.

The other options are not as good as option D, for the following reasons:

Option A: Using the features for monitoring and setting a monitoring-frequency value that is higher

than the default would not enable feature attribution monitoring, and could increase the cost of the

model monitoring job. The monitoring-frequency is a parameter that determines how often the

model monitoring job analyzes the logged prediction requests and calculates the distributions and

distance scores for each feature. Using a higher monitoring-frequency can increase the frequency

and timeliness of the model monitoring job, but also the computation costs of the model monitoring

job. Moreover, using the features for monitoring would not enable feature attribution monitoring,

which can provide more insights into the feature drift and the model performance1.

Option B: Using the features for monitoring and setting a prediction-sampling-rate value that is

closer to 1 than 0 would not enable feature attribution monitoring, and could increase the cost of the

model monitoring job. The prediction-sampling-rate is a parameter that determines the percentage

of prediction requests that are logged and analyzed by the model monitoring job. Using a higher

prediction-sampling-rate can increase the quality and validity of the data, but also the storage and

computation costs of the model monitoring job. Moreover, using the features for monitoring would

not enable feature attribution monitoring, which can provide more insights into the feature drift and

the model performance12.

Option C: Using the features and the feature attributions for monitoring and setting a monitoring-

frequency value that is lower than the default would enable feature attribution monitoring, but

could reduce the frequency and timeliness of the model monitoring job. The monitoring-frequency is

a parameter that determines how often the model monitoring job analyzes the logged prediction

requests and calculates the distributions and distance scores for each feature. Using a lower

monitoring-frequency can reduce the computation costs of the model monitoring job, but also the

frequency and timeliness of the model monitoring job. This can make the model monitoring job less

responsive and effective in detecting and alerting the feature drift1.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML

Systems, Week 4: Evaluation

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in

production, 3.3 Monitoring ML models in production

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:

Production ML Systems, Section 6.3: Monitoring ML Models

Using Model Monitoring

Understanding the score threshold slider

Q: 4

You work for a pet food company that manages an online forum Customers upload photos of their pets on the forum to share with others About 20 photos are uploaded daily You want to automatically and in near real time detect whether each uploaded photo has an animal You want to prioritize time and minimize cost of your application development and deployment What should you do?

Options

Discussion

Grace M. Mar 6, 2026 3:33 am

I keep going back and forth but leaning mostly A. Cloud Vision API does the job with almost zero setup, which is what they're asking for. If it was about custom accuracy or animal breeds, I'd reconsider. Not positive but seems safest for time/cost. Anyone see a case where D would truly be faster?

Luke Mar 2, 2026 10:05 am

A , Cloud Vision API does the heavy lifting with almost no setup. No model training or data labeling needed, so you save loads of time and money. Pretty sure that's what they're looking for here.

Chloe Mar 4, 2026 10:31 pm

I don’t think D is right. A is better since Cloud Vision API gives you object localization out of the box, no need to train or label datasets. It’s faster and cheaper for a simple animal/not animal check, pretty sure that’s what they want.

FocusedTester4509 Feb 24, 2026 1:27 pm

A is wrong, it'd only be right if you don't need breed or species. If the detection had to involve distinguishing animal types, object classification in B or D would be needed. For just animal/no animal and minimum dev, A's better though. Seen similar phrasing trip me up before.

GraceY Feb 21, 2026 7:22 pm

Its A for me. Cloud Vision API lets you send the image right off and does the heavy lifting, so no model training or custom setup needed. Keeps it fast and cheap, which the question emphasizes. If they wanted breed/type detection or super rare animals, then maybe B or D. Open to pushback, not 100%.

Luke Mar 2, 2026 9:17 pm

Don't think D works best here, since it's more manual setup. A lowers dev time and cost, trap is overcomplicating.

Vikram S. Feb 13, 2026 10:48 pm

Probably A. Had something like this in a mock last month, Cloud Vision API makes it basically plug-and-play for basic animal detection.

Chloe U. Feb 28, 2026 12:15 am

Maybe D here. I get that AutoML takes some setup but it lets you distinguish images without needing precise object locations, and it's still on Vertex AI, so simple enough. Could be missing something but D feels reasonable for a binary animal/no-animal check.

Skyler Q. Feb 27, 2026 7:34 pm

Honestly, D. This one seems closer to what the question is asking for.

Meera U. Mar 3, 2026 10:06 pm

I don’t think D is right. A sounds better here since Cloud Vision API does the detection with no training needed and saves a lot of time. D looks tempting if you missed the point about fast deployment, agree?

Be respectful. No spam.

Correct Answer:

Explanation

Cloud Vision API is a service that allows you to analyze images using pre-trained machine learning

models1. You can use Cloud Vision API to perform various tasks, such as face detection, text

extraction, logo recognition, and object localization1. Object localization is a feature that allows you

to detect multiple objects in an image and draw bounding boxes around them2. You can also get the

labels and confidence scores for each detected object2.

By sending user-submitted images to the Cloud Vision API, you can use object localization to identify

all objects in the image and compare the results against a list of animals. You can use

the OBJECT_LOCALIZATION feature type in the AnnotateImageRequest to request object

localization3. You can then use the localizedObjectAnnotations field in

the AnnotateImageResponse to get the list of detected objects, their labels, and their confidence

scores. You can compare the labels with a predefined list of animals, such as dogs, cats, birds, etc.,

and determine whether the image has an animal or not.

This option is the best for your scenario, because it allows you to automatically and in near real time

detect whether each uploaded photo has an animal, without requiring any manual labeling, model

training, or model deployment. You can also prioritize time and minimize cost of your application

development and deployment, as you can use the Cloud Vision API as a ready-to-use service, without

needing any machine learning expertise or infrastructure.

The other options are not suitable for your scenario, because they either require manual labeling,

model training, or model deployment, which would increase the time and cost of your application

development and deployment, or they use object detection models, which are more complex and

computationally expensive than object localization models, and are not necessary for your simple

task of detecting whether an image has an animal or not.

Reference:

Cloud Vision API | Google Cloud

Object localization | Cloud Vision API | Google Cloud

AnnotateImageRequest | Cloud Vision API | Google Cloud

[AnnotateImageResponse | Cloud Vision API | Google Cloud]

Q: 5

You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

Options

Discussion

Parker K. Feb 24, 2026 3:25 am

Option D is what I'd pick. With multiple GPUs and MirroredStrategy, you usually need to up the batch size so each GPU can get enough data to process in parallel, otherwise training time won't drop. Happened to me before in practice, but open if anyone disagrees.

Jack Q. Feb 28, 2026 6:47 am

Probably D. If you just switch to MirroredStrategy but keep the same (small) batch size, each GPU isn't used efficiently so no real speed gain. Increasing the batch size lets GPUs process more data in parallel. Not 100% sure if dataset sharding is a trap here, but D is what I see in similar questions.

Maya K. Feb 28, 2026 3:50 am

B Saw a similar question on a practice set and picked this for more control with MirroredStrategy.

Zoe Feb 27, 2026 4:52 pm

Is the question assuming we want to keep model accuracy exactly the same, or is a minor change in accuracy acceptable if we can finish training faster? That would make D a safer pick.

Quinn T. Feb 23, 2026 2:47 am

D tbh, since without upping batch size the GPUs just aren't being fully used. Pretty sure that's the main trap here, a lot of folks want to jump to A but that's more about dataset distribution, not actual speedup.

Skyler Feb 26, 2026 1:39 pm

Maybe D since if you don’t raise batch size, splitting work over more GPUs usually won’t add up to less time per epoch. There’s a caveat: if the model or dataset is tiny, comms overhead can actually dominate and scaling still doesn’t help. Seen that trip people up in practice. Agree?

Taylor P. Mar 1, 2026 10:14 pm

Nah, it's gotta be D. If you don't bump up the batch size, GPUs just won't get utilized in parallel-I've seen this catch folks out before. C sounds tempting but the real bottleneck here is how much data each GPU gets per step. Disagree?

Priya S. Mar 2, 2026 7:45 am

I don't think it's C. TPUs can be really fast but the question already mentions using GPUs and MirroredStrategy. Usually, if you don’t boost batch size, multi-GPU won’t help much anyway. Anyone see actual speedup with A?

AmeliaO Mar 3, 2026 9:42 am

C vs D, this is so annoying since TPUs are always hyped for speed. I'd pick C because using TPUs with TPUStrategy typically gives a big jump in training speed compared to just using multiple GPUs. If anyone got better results with batch size tweaks let me know.

QuinnS Feb 14, 2026 1:16 pm

Guessing D here, A tempts people but doesn't address GPU utilization properly in this scenario.

Be respectful. No spam.

Correct Answer:

Explanation

Option A is incorrect because distributing the dataset with

tf.distribute.Strategy.experimental_distribute_dataset is not the most effective way to decrease the

training time. This method allows you to distribute your dataset across multiple devices or machines,

by creating a tf.data.Dataset instance that can be iterated over in parallel1. However, this option may

not improve the training time significantly, as it does not change the amount of data or computation

that each device or machine has to process. Moreover, this option may introduce additional

overhead or complexity, as it requires you to handle the data sharding, replication, and

synchronization across the devices or machines1.

Option B is incorrect because creating a custom training loop is not the easiest way to decrease the

training time. A custom training loop is a way to implement your own logic for training your model,

by using low-level TensorFlow APIs, such as tf.GradientTape, tf.Variable, or tf.function2. A custom

training loop may give you more flexibility and control over the training process, but it also requires

more effort and expertise, as you have to write and debug the code for each step of the training loop,

such as computing the gradients, applying the optimizer, or updating the metrics2. Moreover, a

custom training loop may not improve the training time significantly, as it does not change the

amount of data or computation that each device or machine has to process.

Option C is incorrect because using a TPU with tf.distribute.TPUStrategy is not a valid way to decrease

the training time. A TPU (Tensor Processing Unit) is a custom hardware accelerator designed for high-

performance ML workloads3. A tf.distribute.TPUStrategy is a distribution strategy that allows you to

distribute your training across multiple TPUs, by creating a tf.distribute.TPUStrategy instance that can

be used with high-level TensorFlow APIs, such as Keras4. However, this option is not feasible, as

Vertex AI Training does not support TPUs as accelerators for custom training jobs5. Moreover, this

option may require significant code changes, as TPUs have different requirements and limitations

than GPUs.

Option D is correct because increasing the batch size is the best way to decrease the training time.

The batch size is a hyperparameter that determines how many samples of data are processed in each

iteration of the training loop. Increasing the batch size may reduce the training time, as it reduces the

number of iterations needed to train the model, and it allows each device or machine to process

more data in parallel. Increasing the batch size is also easy to implement, as it only requires changing

a single hyperparameter. However, increasing the batch size may also affect the convergence and the

accuracy of the model, so it is important to find the optimal batch size that balances the trade-off

between the training time and the model performance.

Reference:

tf.distribute.Strategy.experimental_distribute_dataset

Custom training loop

TPU overview

tf.distribute.TPUStrategy

Vertex AI Training accelerators

[TPU programming model]

[Batch size and learning rate]

[Keras overview]

[tf.distribute.MirroredStrategy]

[Vertex AI Training overview]

[TensorFlow overview]

Q: 6

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine. You use the following parameters: • Optimizer: SGD • Image shape 224x224 • Batch size 64 • Epochs 10 • Verbose 2 During training you encounter the following error: ResourceExhaustedError: out of Memory (oom) when allocating tensor. What should you do?

Options

Discussion

Robin Feb 28, 2026 6:57 pm

You’re right, batch size is the main thing affecting GPU memory during training. B

Neha J. Feb 16, 2026 10:49 pm

B tbh, since batch size eats up a lot of GPU memory fast. D might be tempting but you'd lose image detail for IDs, so not ideal here. Saw a similar question in practice and B was correct. Trap is thinking optimizer or learning rate helps!

Chloe L. Feb 26, 2026 8:30 pm

Reducing batch size (B) is usually the first thing to try for a ResourceExhaustedError since it scales down tensor allocations pretty quickly. Lowering image shape (D) works too but risks losing critical features in ID images. Pretty sure B is expected, unless resolution drops are acceptable. Agree?

Noah K. Mar 2, 2026 12:13 pm

Why not D? Smaller image shape means less memory per input, might solve OOM too.

Vikram U. Feb 18, 2026 7:55 pm

A is wrong, B. Batch size directly affects how much data the GPU has to hold at once, so lowering it helps with OOM errors right away. Changing optimizer or learning rate won’t really cut memory use. I think B’s the obvious move unless the question limits you.

Chris Feb 17, 2026 4:38 pm

B , option D looks tempting but reducing batch size hits GPU memory use directly. Seen similar in exam reports.

Chloe Feb 22, 2026 9:05 am

Reduce batch size or image shape if OOM, but B is what I usually see required on exam.

SteadyMentor3927 Feb 23, 2026 9:00 pm

I don’t think D is right here. B.

Parker G. Feb 21, 2026 10:30 pm

B vs D. When I've seen this type on other exams, B is the go-to but D is tempting as a trap since changing the image shape also cuts memory. Pretty sure B is still best for OOM on GPU but open to counterpoints if anyone’s tried both.

IvyM Feb 24, 2026 2:29 am

Why does Google keep putting these OOM batch size questions everywhere? Every similar practice I've seen, the answer is B since batch size is the main lever for memory use on GPU.

Be respectful. No spam.

Correct Answer:

Explanation

A ResourceExhaustedError: out of memory (OOM) when allocating tensor is an error that occurs

when the GPU runs out of memory while trying to allocate memory for a tensor. A tensor is a multi-

dimensional array of numbers that represents the data or the parameters of a machine learning

model. The size and shape of a tensor depend on various factors, such as the input data, the model

architecture, the batch size, and the optimization algorithm1.

For the use case of training a computer vision model that predicts the type of government ID present

in a given image using a GPU-powered virtual machine on Compute Engine, the best option to

resolve the error is to reduce the batch size. The batch size is a parameter that determines how many

input examples are processed at a time by the model. A larger batch size can improve the model’s

accuracy and stability, but it also requires more memory and computation. A smaller batch size can

reduce the memory and computation requirements, but it may also affect the model’s performance

and convergence2.

By reducing the batch size, the GPU can allocate less memory for each tensor, and avoid running out

of memory. Reducing the batch size can also speed up the training process, as the GPU can process

more batches in parallel. However, reducing the batch size too much may also have some drawbacks,

such as increasing the noise and variance of the gradient updates, and slowing down the

convergence of the model. Therefore, the optimal batch size should be chosen based on the trade-off

between memory, computation, and performance3.

The other options are not as effective as option B, because they are not directly related to the

memory allocation of the GPU. Option A, changing the optimizer, may affect the speed and quality of

the optimization process, but it may not reduce the memory usage of the model. Option C, changing

the learning rate, may affect the convergence and stability of the model, but it may not reduce the

memory usage of the model. Option D, reducing the image shape, may reduce the size of the input

tensor, but it may also reduce the quality and resolution of the image, and affect the model’s

accuracy. Therefore, option B, reducing the batch size, is the best answer for this question.

Reference:

ResourceExhaustedError: OOM when allocating tensor with shape - Stack Overflow

How does batch size affect model performance and training time? - Stack Overflow

How to choose an optimal batch size for training a neural network? - Stack Overflow

Q: 7

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

Options

Discussion

Logan M. Feb 16, 2026 1:30 am

B. not A. App Engine doesn't fully support containers or ML monitoring, pretty sure B is the right stack here.

LiamM Feb 21, 2026 9:30 am

Makes sense to go with B since only Vertex AI Prediction supports autoscale and Docker containers together.

Maya J. Feb 18, 2026 6:15 am

B . Vertex AI Pipelines does scheduled retraining, Vertex AI Prediction covers online predictions with autoscaling, and Model Monitoring checks the deployed models. Not totally sure but pretty sure the others miss something here. Can someone confirm?

Emma C. Feb 16, 2026 3:19 pm

Yeah, definitely B for this one

Zoe C. Feb 24, 2026 11:16 am

Its B here. Vertex AI Pipelines can handle scheduled retraining, and Vertex AI Prediction supports custom containers plus autoscaling for online prediction. Model Monitoring completes the stack. Only doubt is if Airflow is a strict requirement, but I think B covers all asks.

LaylaT Feb 22, 2026 9:28 am

C or D? Cloud Composer handles scheduling, and Vertex AI Prediction covers serving, so both sound plausible to me. Not sure if BigQuery ML's lack of Docker support is a dealbreaker here though.

Vikram S. Feb 20, 2026 8:38 pm

I don’t think D fits the requirements, since App Engine won’t cover serving models in Docker for online prediction. B is the only option with Vertex AI Prediction and Model Monitoring, so it checks all the boxes here (retraining, Docker support, autoscale, monitoring). If I missed something about Composer let me know, but pretty confident on B.

Ben O. Feb 19, 2026 1:55 am

Model monitoring and autoscaling only really fit with B here. The nitpick for me: if online prediction didn't require custom containers, C could be tempting, but the Docker part rules it out. Pretty sure it's B, but happy to be corrected if anyone's seen Composer pull this off lately.

Avery L. Mar 4, 2026 7:12 am

Option B is correct. Only Vertex AI Prediction supports deploying Docker containers with autoscaling, plus Model Monitoring handles observability. C's tempting, but BigQuery ML can't run custom containers, so it's out. Anyone see a use case where D would fit better?

Ava O. Feb 23, 2026 3:11 am

Yeah, it's gotta be B for this one

Be respectful. No spam.

Correct Answer:

Explanation

Option A is incorrect because Vertex AI Pipelines and App Engine do not meet all the requirements of

the system. Vertex AI Pipelines is a service that allows you to create, run, and manage ML workflows

using TensorFlow Extended (TFX) components or custom components1. App Engine is a service that

allows you to build and deploy scalable web applications using standard or flexible

environments2. However, App Engine does not support Docker containers in the standard

environment, and does not provide a dedicated service for online prediction and monitoring of ML

models3.

Option B is correct because Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring

meet all the requirements of the system. Vertex AI Prediction is a service that allows you to deploy

and serve ML models for online or batch prediction, with support for autoscaling and custom

containers4. Vertex AI Model Monitoring is a service that allows you to monitor the performance and

fairness of your deployed models, and get alerts for any issues or anomalies5.

Option C is incorrect because Cloud Composer, BigQuery ML, and Vertex AI Prediction do not meet

all the requirements of the system. Cloud Composer is a service that allows you to create, schedule,

and manage workflows using Apache Airflow. BigQuery ML is a service that allows you to create and

use ML models within BigQuery using SQL queries. However, BigQuery ML does not support custom

containers, and Vertex AI Prediction does not support scheduled model retraining or model

monitoring.

Option D is incorrect because Cloud Composer, Vertex AI Training with custom containers, and App

Engine do not meet all the requirements of the system. Vertex AI Training is a service that allows you

to train ML models using built-in algorithms or custom containers. However, Vertex AI Training does

not support online prediction or model monitoring, and App Engine does not support Docker

containers in the standard environment or online prediction and monitoring of ML models3.

Reference:

Vertex AI Pipelines overview

App Engine overview

Choosing an App Engine environment

Vertex AI Prediction overview

Vertex AI Model Monitoring overview

[Cloud Composer overview]

[BigQuery ML overview]

[BigQuery ML limitations]

[Vertex AI Training overview]

Q: 8

Your team has a model deployed to a Vertex Al endpoint You have created a Vertex Al pipeline that automates the model training process and is triggered by a Cloud Function. You need to prioritize keeping the model up-to-date, but also minimize retraining costs. How should you configure retraining'?

Options

Discussion

Drew I. Mar 2, 2026 1:22 am

D . Feature drift is directly tied to data changes that impact model accuracy, so retraining only triggers when it's actually needed. Costs stay low since you're not retraining just on a set schedule or for random anomalies. Agree?

Jamie I. Feb 27, 2026 7:50 am

Option D not B. Only D triggers on real feature drift so you don't retrain for no reason.

Piya Feb 23, 2026 11:54 pm

D . Had something like this in a mock and feature drift was flagged as the main signal for retraining triggers, since it points to real shifts in input data distribution. That way, you avoid kicking off expensive retrains just for random anomalies. Pretty sure that's what Google wants here but open to debate.

Aisha H. Feb 28, 2026 4:17 pm

Why D over C? Model monitoring with feature drift directly ties retraining to real data shifts, not just any anomaly. Isn't C kind of a trap since anomalies can be short-term or unrelated to concept drift?

HelpfulArchitect8960 Feb 28, 2026 1:59 pm

C/D? Both are triggered by monitoring, but drift (D) is more about actual changes in input data distribution which directly affects model performance. C talks anomalies, but not all anomalies need retrain. I think D fits the balance between cost and model freshness best but open if anyone thinks otherwise.

Cameron I. Feb 19, 2026 9:23 pm

Yeah, D is it for this one.

Olivia E. Feb 21, 2026 12:46 am

B for me. Setting up a Cloud Scheduler with your preferred frequency lets you fully control costs, since retrains happen only as often as you allow and fit your budget. Pub/Sub event triggers (like feature drift in D) could end up being unpredictable cost-wise if there's noisy data. Not totally sure if that's what Google expects, but B seems like the safe route for minimizing spend. Anyone else see it this way?

RobinY Feb 26, 2026 11:13 am

Yeah, D here. Feature drift is a more targeted trigger for retraining than just anomaly alerts.

Layla D. Feb 15, 2026 3:57 am

C. not D

Sofia R. Mar 4, 2026 11:45 am

D or maybe C? Feature drift (D) directly targets when retraining is actually needed, not just any weird anomaly. C could be a trap because not all anomalies mean the model's stale. Anyone think C fits better if cost wasn't a factor?

Be respectful. No spam.

Q: 9

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

Options

Discussion

Quinn B. Mar 4, 2026 9:00 am

D here, but honestly C looks ok too if you prefer custom stuff. Pretty sure D is quicker for most cases though.

Robin F. Feb 15, 2026 3:33 am

Sofia Feb 23, 2026 11:15 am

saw pretty similar problem in my exam, in a practice exam, D is what they wanted there too.

Chloe Feb 13, 2026 6:33 pm

Option D makes sense since it's literally designed for this. The Kubeflow prebuilt BigQuery Query Component saves coding time and works out of the box with pipelines. I get why some might pick C, but that's extra overhead for no reason here. Pretty sure D is the easiest if that's what they're asking for.

Grace U. Feb 19, 2026 1:31 pm

Nah, I don't think it's C this time. D is the prebuilt component trap wins here.

KevinG Feb 17, 2026 4:32 pm

Isn't this a classic case where just checking the official Kubeflow docs or practice exams clears it up?

Mia N. Feb 14, 2026 9:41 pm

D imo. Using a prebuilt BigQuery Query Component saves time and avoids extra Python coding or Docker image building. If the question was about flexibility instead, maybe C would fit, but "easiest" points to D here. Not 100 percent sure though, could see an argument for C in restricted environments.

Maya Feb 15, 2026 1:11 pm

B or C. If the Kubeflow registry is blocked or needs custom query logic, writing a Python script (B) or making your own component (C) could be easier in practice. Not sure which one is actually faster.

Jordan Z. Feb 14, 2026 9:28 pm

Not sure, but shouldn't we just go with the built-in Kubeflow component for this instead of coding from scratch?

Jamie Feb 22, 2026 1:46 pm

B tbh

Be respectful. No spam.

Correct Answer:

Explanation

Kubeflow is an open source platform for developing, orchestrating, deploying, and running scalable

and portable machine learning workflows on Kubernetes. Kubeflow Pipelines is a component of

Kubeflow that allows you to build and manage end-to-end machine learning pipelines using a

graphical user interface or a Python-based domain-specific language (DSL). Kubeflow Pipelines can

help you automate and orchestrate your machine learning workflows, and integrate with various

Google Cloud services and tools1

One of the Google Cloud services that you can use with Kubeflow Pipelines is BigQuery, which is a

serverless, scalable, and cost-effective data warehouse that allows you to run fast and complex

queries on large-scale data. BigQuery can help you analyze and prepare your data for machine

learning, and store and manage your machine learning models2

To execute a query against BigQuery as the first step in your Kubeflow pipeline, and use the results of

that query as the input to the next step in your pipeline, the easiest way to do that is to use the

BigQuery Query Component, which is a pre-built component that you can find in the Kubeflow

Pipelines repository on GitHub. The BigQuery Query Component allows you to run a SQL query on

BigQuery, and output the results as a table or a file. You can use the component’s URL to load the

component into your pipeline, and specify the query and the output parameters. You can then use

the output of the component as the input to the next step in your pipeline, such as a data processing

or a model training step3

The other options are not as easy or feasible. Using the BigQuery console to execute your query and

then save the query results into a new BigQuery table is not a good idea, as it does not integrate with

your Kubeflow pipeline, and requires manual intervention and duplication of data. Writing a Python

script that uses the BigQuery API to execute queries against BigQuery is not ideal, as it requires

writing custom code and handling authentication and error handling. Using the Kubeflow Pipelines

DSL to create a custom component that uses the Python BigQuery client library to execute queries is

not optimal, as it requires creating and packaging a Docker container image for the component, and

testing and debugging the component.

Reference: 1: Kubeflow Pipelines overview 2: BigQuery overview 3: BigQuery Query Component

Q: 10

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user- managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries: CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.8); CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.2); After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Options

Discussion

CaseyU Feb 18, 2026 3:43 am

Wait, but doesn’t the RAND() approach here mean some records show up in both training and validation tables, not necessarily every record? Feels like partial overlap (option C) is the bigger issue, especially since D would only ever happen if you got super unlucky with a tiny dataset. Am I missing something?

Riley J. Feb 21, 2026 4:24 pm

C . D is a trap, it's not every record that's duplicated, it's just some overlap because RAND() is used twice per row. I've seen this come up on similar questions.

Casey Z. Feb 26, 2026 3:21 pm

Ajay S. Feb 24, 2026 9:51 am

D here. Since RAND() < 0.2 runs separately, it's possible (though rare) for every record to satisfy both conditions and end up in both tables, especially if the dataset is tiny or badly randomized. Not totally sure, open to other takes.

Liam K. Feb 26, 2026 11:18 am

D , since if a row gets RAND() < 0.2 both times, it's in both sets for sure. So technically possible every record lands in both if you're really unlucky, especially with small tables. Not totally confident though, might be missing something about typical overlap rates.

Maya V. Feb 22, 2026 6:15 am

Yeah this is definitely C. The way RAND() works in both queries means some records will end up in both tables, which messes with your validation accuracy. Pretty common pitfall if you aren't using a deterministic split like FARM_FINGERPRINT. Agree?

Zoe A. Feb 13, 2026 7:27 pm

D isn't right here. C is the real issue, because separate RAND() calls can lead to some records landing in both sets, so your validation leaks into training. Not 100% sure if there's a tiny edge case for D with a tiny dataset, but C matches what usually happens.

Cameron Mar 4, 2026 11:22 am

C tbh, partial overlap is the actual gotcha here, not full duplication. With RAND() like that you always risk leaking some records into both sets unless you hash on unique ids instead. If I'm wrong let me know, but pretty sure that's what trips people up.

Aisha Feb 16, 2026 10:29 pm

C/D? If the question stressed you must avoid any overlap, D wins, but for practical leakage C is correct.

SharpDev7297 Feb 17, 2026 12:07 am

Its D since if RAND() gives a value less than 0.2, then those records would always be in both sets. Pretty edge case but technically that overlap could cover every row if the table is small enough or the random numbers lined up. Not 100% sure, so feel free to disagree.

Be respectful. No spam.

Question 1 of 20 · Page 1 / 2

Premium Access Includes

✓ Quiz Simulator
✓ Exam Mode
✓ Progress Tracking
✓ Question Saving
✓ Flash Cards
✓ Drag & Drops
✓ 3 Months Access
✓ PDF Downloads

Get Premium Access

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE