View Professional Machine Learning Engineer Exam Questions

Q: 11

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex Explainable AI. What should you do?

Options

Discussion

Zoe Mar 6, 2026 8:11 am

B . Some folks go for C but integrated gradients is mostly for images and text, not tabular data like churn. Easy trap there.

Sofia O. Mar 2, 2026 4:47 am

Sampled Shapley values (B) is what Vertex Explainable AI uses for feature attribution on tabular data. Integrated gradients aren't supported in this case, that's more for images or text. Pretty sure B is the right move here.

Maya I. Feb 19, 2026 9:57 pm

Probably B, integrated gradients (C) is tempting but it's a trap for tabular data like this.

Vikram Feb 17, 2026 10:45 pm

B from what I've seen in recent exam reports, sampled Shapley is the go-to for tabular churn with Vertex Explainable AI.

Noah Feb 17, 2026 6:55 am

Why not use local surrogate models for this scenario? Has anyone seen official practice tests suggest A or is it always Shapley (B)?

QuickDev7154 Feb 20, 2026 2:35 pm

A is wrong, B. Integrated gradients is a trap since it’s not supported for tabular data here.

Aaron Q. Feb 19, 2026 5:59 pm

I get why some folks might pick C, but for Vertex Explainable AI and tabular churn data, it's probably B.

CuriousDev2307 Feb 18, 2026 11:51 pm

Why not just use integrated gradients here? Is there a reason Vertex Explainable AI doesn't support that for tabular churn data, even if it's an ensemble model? I thought integrated gradients explained feature impacts well, so why is Shapley preferred in this case?

Sara Z. Feb 22, 2026 10:37 pm

Nathan C. Feb 14, 2026 4:19 am

I don’t think it’s B. C. Integrated gradients sometimes get used for explainability, especially when you want to trace prediction changes as inputs shift from baseline, and might still help even if it’s not image data. Maybe there’s a trap here with Shapley?

Be respectful. No spam.

Correct Answer:

Explanation

Option A is incorrect because training local surrogate models to explain individual predictions is not a

feature of Vertex Explainable AI, but rather a general technique for interpreting black-box

models. Local surrogate models are simpler models that approximate the behavior of the original

model around a specific input1.

Option B is correct because configuring sampled Shapley explanations on Vertex Explainable AI is a

way to explain the difference between the actual prediction and the average prediction for a given

input. Sampled Shapley explanations are based on the Shapley value, which is a game-theoretic

concept that measures how much each feature contributes to the prediction2. Vertex Explainable AI

supports sampled Shapley explanations for tabular data, such as customer churn3.

Option C is incorrect because configuring integrated gradients explanations on Vertex Explainable AI

is not suitable for explaining the difference between the actual prediction and the average prediction

for a given input. Integrated gradients explanations are based on the idea of computing the gradients

of the prediction with respect to the input features along a path from a baseline input to the actual

input4. Vertex Explainable AI supports integrated gradients explanations for image and text data, but

not for tabular data3.

Option D is incorrect because measuring the effect of each feature as the weight of the feature

multiplied by the feature value is not a valid way to explain the difference between the actual

prediction and the average prediction for a given input. This method assumes that the model is linear

and additive, which is not the case for an ensemble of trees and neural networks. Moreover, this

method does not account for the interactions between features or the non-linearity of the model5.

Reference:

Local surrogate models

Shapley value

Vertex Explainable AI overview

Integrated gradients

Feature importance

Q: 12

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

Options

Discussion

Sanjay Feb 28, 2026 8:08 am

C . Cloud Build triggers off code pushes so you get auto retraining and solid version control, fits the use case way better than Composer polling or doing stuff manually. Anyone disagree? Maybe I'm missing a Composer trick but don't think so.

Sam M. Feb 14, 2026 9:02 pm

Gotta be C here. Cloud Build with Source Repos gives you real version control and automates the retrain as soon as code's updated, saves time and cuts costs. Composer (D) doesn't track code changes directly. Pretty sure C matches what's needed but open if someone sees a use case for D.

Chris F. Mar 3, 2026 10:18 am

Remember something like this from a mock test, C is the way to go. Cloud Build with Source Repos gives you auto triggers when code changes and keeps version control simple. D just polls on a schedule and doesn't manage code updates directly. Think C covers all the requirements better but open to other views.

Ethan Feb 19, 2026 7:20 pm

I don't get why Google loves making these so convoluted. C.

LeoP Feb 25, 2026 5:00 am

C/D? Cloud Composer (D) feels like a trap here since it polls on a schedule and doesn't handle versioning well. C lets you automate with repo triggers and proper code management. Pretty sure it's C but let me know if you think Composer makes sense for this use case.

Sam K. Mar 1, 2026 1:25 pm

Option D

Grace J. Mar 4, 2026 12:38 am

C no manual steps and you get code versioning. Can't see a better fit here.

Be respectful. No spam.

Correct Answer:

Explanation

Developing ML models with AI Platform for image segmentation on CT scans requires a lot of

computation and experimentation, as image segmentation is a complex and challenging task that

involves assigning a label to each pixel in an image. Image segmentation can be used for various

medical applications, such as tumor detection, organ segmentation, or lesion localization1

To minimize the computation costs and manual intervention while having version control for the

code, one should use Cloud Build linked with Cloud Source Repositories to trigger retraining when

new code is pushed to the repository. Cloud Build is a service that executes your builds on Google

Cloud Platform infrastructure. Cloud Build can import source code from Cloud Source Repositories,

Cloud Storage, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts

such as Docker containers or Java archives2

Cloud Build allows you to set up automated triggers that start a build when changes are pushed to a

source code repository. You can configure triggers to filter the changes based on the branch, tag, or

file path3

Cloud Source Repositories is a service that provides fully managed private Git repositories on Google

Cloud Platform. Cloud Source Repositories allows you to store, manage, and track your code using

the Git version control system. You can also use Cloud Source Repositories to connect to other

Google Cloud services, such as Cloud Build, Cloud Functions, or Cloud Run4

To use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is

pushed to the repository, you need to do the following steps:

Create a Cloud Source Repository for your code, and push your code to the repository. You can use

the Cloud SDK, Cloud Console, or Cloud Source Repositories API to create and manage your

repository5

Create a Cloud Build trigger for your repository, and specify the build configuration and the trigger

settings. You can use the Cloud SDK, Cloud Console, or Cloud Build API to create and manage your

trigger.

Specify the steps of the build in a YAML or JSON file, such as installing the dependencies, running the

tests, building the container image, and submitting the training job to AI Platform. You can also use

the Cloud Build predefined or custom build steps to simplify your build configuration.

Push your new code to the repository, and the trigger will start the build automatically. You can

monitor the status and logs of the build using the Cloud SDK, Cloud Console, or Cloud Build API.

The other options are not as easy or feasible. Using Cloud Functions to identify changes to your code

in Cloud Storage and trigger a retraining job is not ideal, as Cloud Functions has limitations on the

memory, CPU, and execution time, and does not provide a user interface for managing and tracking

your builds. Using the gcloud command-line tool to submit training jobs on AI Platform when you

update your code is not optimal, as it requires manual intervention and does not leverage the

benefits of Cloud Build and its integration with Cloud Source Repositories. Creating an automated

workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a

sensor is not relevant, as Cloud Composer is mainly designed for orchestrating complex workflows

across multiple systems, and does not provide a version control system for your code.

Reference: 1: Image segmentation 2: Cloud Build overview 3: Creating and managing build

triggers 4: Cloud Source Repositories overview 5: Quickstart: Create a repository : [Quickstart: Create

a build trigger] : [Configuring builds] : [Viewing build results]

Q: 13

You work for the AI team of an automobile company, and you are developing a visual defect detection model using TensorFlow and Keras. To improve your model performance, you want to incorporate some image augmentation functions such as translation, cropping, and contrast tweaking. You randomly apply these functions to each training batch. You want to optimize your data processing pipeline for run time and compute resources utilization. What should you do?

Options

Discussion

LiamB Feb 21, 2026 7:32 pm

A. Official TensorFlow docs or practice labs usually push for tf.Data pipeline for this type of augmentation.

Piya I. Mar 3, 2026 12:05 am

A . Had something like this in a mock and tf.Data pipelines are way more efficient for augmentation since they use native TensorFlow ops and support optimizations like parallel mapping and prefetching. Keras generators are decent for small stuff but don't scale or integrate as well, especially with distributed training. Pretty sure A is what they're looking for here, unless I'm missing a subtlety.

Skyler D. Feb 26, 2026 2:16 am

A is the way to go here. tf.Data pipelines can parallelize and optimize these augmentations on the fly, plus it's easier to scale and integrate with distributed training. Pretty sure that's what Google expects in this scenario, though B isn't totally wrong for smaller tasks.

NoraX Mar 3, 2026 2:56 pm

tf.Data pipeline wins every time for this in Google exams, tbh. A imo

Riley Feb 22, 2026 8:37 pm

C/D? I know pre-generating augmentations with Dataflow (C or D) is tempting because you could save run-time augmentation cost, but having tried something like this before, the storage overhead gets out of hand really fast. Also, tf.Data pipeline (A) typically does these ops efficiently on the fly and fits better with distributed TensorFlow workflows. Maybe someone sees a reason to pick C or D over A for a very large dataset?

FriendlyAuditor6414 Feb 15, 2026 6:17 am

I don’t think B is best. Keras generators work but they can be slower, especially for large datasets. tf.Data pipeline usually gives better performance but option C seems tempting since pre-generated augmentations could save time during training, right?

Olivia K. Feb 19, 2026 5:44 pm

Probably B since Keras generators can handle augmentations and batching together, saw similar approach in some exam reports.

Kevin L. Mar 4, 2026 10:43 pm

tf.Data lets you apply those augmentations on the fly and take advantage of TensorFlow's built-in optimizations. A is the efficient pick for compute and time. If I'm missing a catch about the dataset size, let me know.

Nora W. Feb 27, 2026 9:47 am

Option D seems like a trap here-staging as TFRecords with Dataflow sounds scalable but adds complexity and latency. A is more efficient with tf.Data, but I get why someone might pick D for big pipelines. Not 100% though.

Be respectful. No spam.

Correct Answer:

Explanation

The best option for optimizing the data processing pipeline for run time and compute resources

utilization is to embed the augmentation functions dynamically in the tf.Data pipeline. This option

has the following advantages:

It allows the data augmentation to be performed on the fly, without creating or storing additional

copies of the data. This saves storage space and reduces the data transfer time.

It leverages the parallelism and performance of the tf.Data API, which can efficiently apply the

augmentation functions to multiple batches of data in parallel, using multiple CPU cores or GPU

devices. The tf.Data API also supports various optimization techniques, such as caching, prefetching,

and autotuning, to improve the data processing speed and reduce the latency.

It integrates seamlessly with the TensorFlow and Keras models, which can consume the tf.Data

datasets as inputs for training and evaluation. The tf.Data API also supports various data formats,

such as images, text, audio, and video, and various data sources, such as files, databases, and web

services.

The other options are less optimal for the following reasons:

Option B: Embedding the augmentation functions dynamically as part of Keras generators introduces

some limitations and overhead. Keras generators are Python generators that yield batches of data for

training or evaluation. However, Keras generators are not compatible with the tf.distribute API,

which is used to distribute the training across multiple devices or machines. Moreover, Keras

generators are not as efficient or scalable as the tf.Data API, as they run on a single Python thread

and do not support parallelism or optimization techniques.

Option C: Using Dataflow to create all possible augmentations, and store them as TFRecords

introduces additional complexity and cost. Dataflow is a fully managed service that runs Apache

Beam pipelines for data processing and transformation. However, using Dataflow to create all

possible augmentations requires generating and storing a large number of augmented images, which

can consume a lot of storage space and incur storage and network costs. Moreover, using Dataflow to

create the augmentations requires writing and deploying a separate Dataflow pipeline, which can be

tedious and time-consuming.

Option D: Using Dataflow to create the augmentations dynamically per training run, and stage them

as TFRecords introduces additional complexity and latency. Dataflow is a fully managed service that

runs Apache Beam pipelines for data processing and transformation. However, using Dataflow to

create the augmentations dynamically per training run requires running a Dataflow pipeline every

time the model is trained, which can introduce latency and delay the training process. Moreover,

using Dataflow to create the augmentations requires writing and deploying a separate Dataflow

pipeline, which can be tedious and time-consuming.

Reference:

[tf.data: Build TensorFlow input pipelines]

[Image augmentation | TensorFlow Core]

[Dataflow documentation]

Q: 14

You recently deployed a scikit-learn model to a Vertex Al endpoint You are now testing the model on live production traffic While monitoring the endpoint. you discover twice as many requests per hour than expected throughout the day You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency What should you do?

Options

Discussion

Alex X. Feb 25, 2026 2:04 am

B . Setting minReplicaCount handles the baseline traffic and lets autoscaling keep up when demand jumps. Seen similar advice for Vertex AI endpoints before, seems like the best fit here.

Sara Mar 4, 2026 12:18 am

Option C could work too. Raising target utilization in autoscalingMetricSpecs can help use resources more efficiently during spikes, and I saw similar advice in some official guides and Vertex AI practice labs. Not 100 percent sure though, maybe B is safer?

Logan W. Feb 23, 2026 4:41 am

Probably B, is right. Had something like this in a mock: minReplicaCount keeps enough replicas always running, which helps prevent slow responses when load suddenly increases. Autoscaling handles the spikes, but if your baseline is too low you’ll get cold starts. Not 100% sure but this matches how Vertex AI scaling works, agree?

Riley O. Feb 26, 2026 12:51 am

B imo. Setting minReplicaCount lets Vertex AI keep enough replicas running to avoid cold starts when traffic jumps, so users don’t get high latency. Autoscaling then takes care of big surges above baseline. The other options either miss the point (like adding GPUs if you don't need them for scikit-learn) or just balance between models, not really scaling. Pretty sure on this but let me know if someone had luck with C.

Ben Feb 27, 2026 7:24 pm

Not sure B is always best-if traffic patterns are spiky but baseline is super low, bumping minReplicaCount just burns resources. Wouldn't tuning maxReplicaCount or target utilization matter more in that edge case?

Riley O. Feb 17, 2026 8:59 pm

B is the way to go. Setting minReplicaCount helps Vertex AI autoscale properly for spikes without risking high latency. The other choices don't directly address scaling for unpredictable traffic. Pretty sure about this, but open to corrections if I'm missing something.

Be respectful. No spam.

Correct Answer:

Explanation

The best option for scaling a Vertex AI endpoint efficiently when the demand increases in the future,

using a scikit-learn model that is deployed to a Vertex AI endpoint and tested on live production

traffic, is to configure an appropriate minReplicaCount value based on expected baseline traffic. This

option allows you to leverage the power and simplicity of Vertex AI to automatically scale your

endpoint resources according to the traffic patterns. Vertex AI is a unified platform for building and

deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an

online prediction endpoint, which can provide low-latency predictions for individual instances.

Vertex AI can also provide various tools and services for data analysis, model development, model

deployment, model monitoring, and model governance. A minReplicaCount value is a parameter

that specifies the minimum number of replicas that the endpoint must always have, regardless of the

load. A minReplicaCount value can help you ensure that the endpoint has enough resources to

handle the expected baseline traffic, and avoid high latency or errors. By configuring an appropriate

minReplicaCount value based on expected baseline traffic, you can scale your endpoint efficiently

when the demand increases in the future. You can set the minReplicaCount value when you deploy

the model to the endpoint, or update it later. Vertex AI will automatically scale up or down the

number of replicas within the range of the minReplicaCount and maxReplicaCount values, based on

the target utilization percentage and the autoscaling metric1.

The other options are not as good as option B, for the following reasons:

Option A: Deploying two models to the same endpoint and distributing requests among them evenly

would not allow you to scale your endpoint efficiently when the demand increases in the future, and

could increase the complexity and cost of the deployment process. A model is a resource that

represents a machine learning model that you can use for prediction. A model can have one or more

versions, which are different implementations of the same model. A model version can help you

experiment and iterate on your model, and improve the model performance and accuracy. An

endpoint is a resource that provides the service endpoint (URL) you use to request the prediction. An

endpoint can have one or more deployed models, which are instances of model versions that are

associated with physical resources. A deployed model can help you serve online predictions with low

latency, and scale up or down based on the traffic. By deploying two models to the same endpoint

and distributing requests among them evenly, you can create a load balancing mechanism that can

distribute the traffic across the models, and reduce the load on each model. However, deploying two

models to the same endpoint and distributing requests among them evenly would not allow you to

scale your endpoint efficiently when the demand increases in the future, and could increase the

complexity and cost of the deployment process. You would need to write code, create and configure

the two models, deploy the models to the same endpoint, and distribute the requests among them

evenly. Moreover, this option would not use the autoscaling feature of Vertex AI, which can

automatically adjust the number of replicas based on the traffic patterns, and provide various

benefits, such as optimal resource utilization, cost savings, and performance improvement2.

Option C: Setting the target utilization percentage in the autoscalingMetricSpecs configuration to a

higher value would not allow you to scale your endpoint efficiently when the demand increases in

the future, and could cause errors or poor performance. A target utilization percentage is a

parameter that specifies the desired utilization level of each replica. A target utilization percentage

can affect the speed and accuracy of the autoscaling process. A higher target utilization percentage

can help you reduce the number of replicas, but it can also cause high latency, low throughput, or

resource exhaustion. By setting the target utilization percentage in the autoscalingMetricSpecs

configuration to a higher value, you can increase the utilization level of each replica, and save some

resources. However, setting the target utilization percentage in the autoscalingMetricSpecs

configuration to a higher value would not allow you to scale your endpoint efficiently when the

demand increases in the future, and could cause errors or poor performance. You would need to

write code, create and configure the autoscalingMetricSpecs, and set the target utilization

percentage to a higher value. Moreover, this option would not ensure that the endpoint has enough

resources to handle the expected baseline traffic, which could cause high latency or errors1.

Option D: Changing the model’s machine type to one that utilizes GPUs would not allow you to scale

your endpoint efficiently when the demand increases in the future, and could increase the

complexity and cost of the deployment process. A machine type is a parameter that specifies the

type of virtual machine that the prediction service uses for the deployed model. A machine type can

affect the speed and accuracy of the prediction process. A machine type that utilizes GPUs can help

you accelerate the computation and processing of the prediction, and handle more prediction

requests at the same time. By changing the model’s machine type to one that utilizes GPUs, you can

improve the prediction performance and efficiency of your model. However, changing the model’s

machine type to one that utilizes GPUs would not allow you to scale your endpoint efficiently when

the demand increases in the future, and could increase the complexity and cost of the deployment

process. You would need to write code, create and configure the model, deploy the model to the

endpoint, and change the machine type to one that utilizes GPUs. Moreover, this option would not

use the autoscaling feature of Vertex AI, which can automatically adjust the number of replicas based

on the traffic patterns, and provide various benefits, such as optimal resource utilization, cost

savings, and performance improvement2.

Reference:

Configure compute resources for prediction | Vertex AI | Google Cloud

Deploy a model to an endpoint | Vertex AI | Google Cloud

Q: 15

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

Options

Q: 16

You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage Bucket. What should you do?

Options

Correct Answer:

Explanation

The best option for developing an image classification model by using a large dataset that contains

labeled images in a Cloud Storage bucket is to import the labeled images as a managed dataset in

Vertex AI and use AutoML to train the model. This option allows you to leverage the power and

simplicity of Google Cloud to create and deploy a high-quality image classification model with

minimal code and configuration. Vertex AI is a unified platform for building and deploying machine

learning solutions on Google Cloud. Vertex AI can create a managed dataset from a Cloud Storage

bucket that contains labeled images, which can be used to train an AutoML model. AutoML is a

service that can automatically build and optimize machine learning models for various tasks, such as

image classification, object detection, natural language processing, and tabular data analysis.

AutoML can handle the complex aspects of machine learning, such as feature engineering, model

architecture, hyperparameter tuning, and model evaluation. AutoML can also evaluate, deploy, and

monitor the image classification model, and provide online or batch predictions. By using Vertex AI

and AutoML, users can develop an image classification model by using a large dataset with ease and

efficiency.

The other options are not as good as option C, for the following reasons:

Option A: Using Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. Vertex AI Pipelines is a service that can orchestrate machine learning

workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom

Docker images, and evaluate, deploy, and monitor the machine learning model. Kubeflow Pipelines

SDK is a Python library that can create and run pipelines on Vertex AI Pipelines or on Kubeflow, an

open-source platform for machine learning on Kubernetes. However, using Vertex AI Pipelines and

Kubeflow Pipelines SDK would require writing code, building Docker images, defining pipeline

components and steps, and managing the pipeline execution and artifacts. Moreover, Vertex AI

Pipelines and Kubeflow Pipelines SDK are not specialized for image classification, and users would

need to use other libraries or frameworks, such as TensorFlow or PyTorch, to build and train the

image classification model.

Option B: Using Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. TensorFlow Extended (TFX) is a framework that can create and run end-to-end

machine learning pipelines on TensorFlow, a popular library for building and training deep learning

models. TFX can preprocess the data, train and evaluate the model, validate and push the model,

and serve the model for online or batch predictions. However, using Vertex AI Pipelines and TFX

would require writing code, building Docker images, defining pipeline components and steps, and

managing the pipeline execution and artifacts. Moreover, TFX is not optimized for image

classification, and users would need to use other libraries or tools, such as TensorFlow Data

Validation, TensorFlow Transform, and TensorFlow Hub, to handle the image data and the model

architecture.

Option D: Converting the image dataset to a tabular format using Dataflow, loading the data into

BigQuery, and using BigQuery ML to train the model would not handle the image data properly and

could result in a poor model performance. Dataflow is a service that can create scalable and reliable

pipelines to process large volumes of data from various sources. Dataflow can preprocess the data by

using Apache Beam, a programming model for defining and executing data processing workflows.

BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast and

interactive queries on large datasets. BigQuery ML is a service that can create and train machine

learning models by using SQL queries on BigQuery. However, converting the image data to a tabular

format would lose the spatial and semantic information of the images, which are essential for image

classification. Moreover, BigQuery ML is not specialized for image classification, and users would

need to use other tools or techniques, such as feature hashing, embedding, or one-hot encoding, to

handle the categorical features.

Q: 17

You have created a Vertex Al pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket The second step uses the processed data to train a model You need to update the model's code to allow you to test different algorithms You want to reduce pipeline execution time and cost, while also minimizing pipeline changes What should you do?

Options

Discussion

Owen Feb 22, 2026 11:52 am

Not sure about that, pretty sure A is better here. If you add a pipeline parameter and step to decide whether to preprocess or not, you can skip preprocessing if data hasn't changed and just go straight to training. That way, you're not redoing heavy ETL work every time, and the change is minimal. But open to other ideas if I'm missing something obvious?

Casey K. Mar 6, 2026 8:11 am

Caching on Vertex AI lets you reuse the output from the preprocessing step, so you only rerun it if the inputs or code change. That way, when testing different algorithms in training, you skip the big ETL re-do and save both time and compute cost. Pretty sure D makes the most sense here since it keeps pipeline changes minimal too. Feel free to point out if I'm missing a catch.

Riley T. Feb 23, 2026 1:03 pm

MethodicalDev8419 Feb 14, 2026 10:49 pm

Layla B. Feb 15, 2026 6:44 am

Is there a reason not to just enable pipeline caching for the preprocessing step and keep training uncached? That seems fastest and is suggested by most exam resources I've seen. Official guide talks about this approach.

Casey C. Feb 15, 2026 5:22 pm

I'd pick C here. More CPU and RAM for the preprocessing step should speed things up without changing the pipeline much, I think. Disagree?

Aaron G. Mar 4, 2026 12:18 pm

D (I'm a bit confused but caching seems to save rerunning preprocessing, which cuts cost and keeps things simple?)

Be respectful. No spam.

Correct Answer:

Explanation

The best option for reducing pipeline execution time and cost, while also minimizing pipeline

changes, is to enable caching for the pipeline job, and disable caching for the model training step.

This option allows you to leverage the power and simplicity of Vertex AI Pipelines to reuse the output

of the data preprocessing step, and avoid unnecessary recomputation. Vertex AI Pipelines is a service

that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run

preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the

machine learning model. Caching is a feature of Vertex AI Pipelines that can store and reuse the

output of a pipeline step, and skip the execution of the step if the input parameters and the code

have not changed. Caching can help you reduce the pipeline execution time and cost, as you do not

need to re-run the same step with the same input and code. Caching can also help you minimize the

pipeline changes, as you do not need to add or remove any pipeline steps or parameters. By enabling

caching for the pipeline job, and disabling caching for the model training step, you can create a

Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data, completes in about

1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to

train a model. You can update the model’s code to allow you to test different algorithms, and run the

pipeline job with caching enabled. The pipeline job will reuse the output of the data preprocessing

step from the cache, and skip the execution of the step. The pipeline job will run the model training

step with the updated code, and disable the caching for the step. This way, you can reduce the

pipeline execution time and cost, while also minimizing pipeline changes1.

The other options are not as good as option D, for the following reasons:

Option A: Adding a pipeline parameter and an additional pipeline step, depending on the parameter

value, the pipeline step conducts or skips data preprocessing and starts model training, would

require more skills and steps than enabling caching for the pipeline job, and disabling caching for the

model training step. A pipeline parameter is a variable that can be used to control the input or

output of a pipeline step. A pipeline parameter can help you customize the pipeline logic and

behavior, and experiment with different values. An additional pipeline step is a new instance of a

pipeline component that can perform a part of the pipeline workflow, such as data preprocessing or

model training. An additional pipeline step can help you extend the pipeline functionality and

complexity, and handle different scenarios. However, adding a pipeline parameter and an additional

pipeline step, depending on the parameter value, the pipeline step conducts or skips data

preprocessing and starts model training, would require more skills and steps than enabling caching

for the pipeline job, and disabling caching for the model training step. You would need to write code,

define the pipeline parameter, create the additional pipeline step, implement the conditional logic,

and compile and run the pipeline. Moreover, this option would not reuse the output of the data

preprocessing step from the cache, but rather from the Cloud Storage bucket, which can increase the

data transfer and access costs1.

Option B: Creating another pipeline without the preprocessing step, and hardcoding the

preprocessed Cloud Storage file location for model training, would require more skills and steps than

enabling caching for the pipeline job, and disabling caching for the model training step. A pipeline

without the preprocessing step is a pipeline that only includes the model training step, and uses the

preprocessed data from the Cloud Storage bucket as the input. A pipeline without the preprocessing

step can help you avoid running the data preprocessing step every time, and reduce the pipeline

execution time and cost. However, creating another pipeline without the preprocessing step, and

hardcoding the preprocessed Cloud Storage file location for model training, would require more skills

and steps than enabling caching for the pipeline job, and disabling caching for the model training

step. You would need to write code, create a new pipeline, remove the preprocessing step, hardcode

the Cloud Storage file location, and compile and run the pipeline. Moreover, this option would not

reuse the output of the data preprocessing step from the cache, but rather from the Cloud Storage

bucket, which can increase the data transfer and access costs. Furthermore, this option would create

another pipeline, which can increase the maintenance and management costs1.

Option C: Configuring a machine with more CPU and RAM from the compute-optimized machine

family for the data preprocessing step, would not reduce the pipeline execution time and cost, while

also minimizing pipeline changes, but rather increase the pipeline execution cost and complexity. A

machine with more CPU and RAM from the compute-optimized machine family is a virtual machine

that has a high ratio of CPU cores to memory, and can provide high performance and scalability for

compute-intensive workloads. A machine with more CPU and RAM from the compute-optimized

machine family can help you optimize the data preprocessing step, and reduce the pipeline execution

time. However, configuring a machine with more CPU and RAM from the compute-optimized

machine family for the data preprocessing step, would not reduce the pipeline execution time and

cost, while also minimizing pipeline changes, but rather increase the pipeline execution cost and

complexity. You would need to write code, configure the machine type parameters for the data

preprocessing step, and compile and run the pipeline. Moreover, this option would increase the

pipeline execution cost, as machines with more CPU and RAM from the compute-optimized machine

family are more expensive than machines with less CPU and RAM from other machine

families. Furthermore, this option would not reuse the output of the data preprocessing step from

the cache, but rather re-run the data preprocessing step every time, which can increase the pipeline

execution time and cost1.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML

Systems, Week 3: MLOps

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in

production, 3.2 Automating ML workflows

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:

Production ML Systems, Section 6.4: Automating ML Workflows

Vertex AI Pipelines

Caching

Pipeline parameters

Machine types

Q: 18

You are training an ML model on a large dataset. You are using a TPU to accelerate the training process You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?

Options

Correct Answer:

Explanation

The best option for training an ML model on a large dataset, using a TPU to accelerate the training

process, and discovering that the TPU is not reaching its full capacity, is to increase the batch size.

This option allows you to leverage the power and simplicity of TPUs to train your model faster and

more efficiently. A TPU is a custom-developed application-specific integrated circuit (ASIC) that can

accelerate machine learning workloads. A TPU can provide high performance and scalability for

various types of models, such as linear regression, logistic regression, k-means clustering, matrix

factorization, and deep neural networks. A TPU can also support various tools and frameworks, such

as TensorFlow, PyTorch, and JAX. A batch size is a parameter that specifies the number of training

examples in one forward/backward pass. A batch size can affect the speed and accuracy of the

training process. A larger batch size can help you utilize the parallel processing power of the TPU, and

reduce the communication overhead between the TPU and the host CPU. A larger batch size can also

help you avoid overfitting, as it can reduce the variance of the gradient updates. By increasing the

batch size, you can train your model on a large dataset faster and more efficiently, and make full use

of the TPU capacity1.

The other options are not as good as option D, for the following reasons:

Option A: Increasing the learning rate would not help you utilize the parallel processing power of the

TPU, and could cause errors or poor performance. A learning rate is a parameter that controls how

much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the

training process. A larger learning rate can help you converge faster, but it can also cause instability,

divergence, or oscillation. By increasing the learning rate, you may not be able to find the optimal

solution, and your model may perform poorly on the validation or test data2.

Option B: Increasing the number of epochs would not help you utilize the parallel processing power

of the TPU, and could increase the complexity and cost of the training process. An epoch is a measure

of the number of times all of the training examples are used once in the training process. An epoch

can affect the speed and accuracy of the training process. A larger number of epochs can help you

learn more from the data, but it can also cause overfitting, underfitting, or diminishing returns. By

increasing the number of epochs, you may not be able to improve the model performance

significantly, and your training process may take longer and consume more resources3.

Option C: Decreasing the learning rate would not help you utilize the parallel processing power of the

TPU, and could slow down the training process. A learning rate is a parameter that controls how

much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the

training process. A smaller learning rate can help you find a more precise solution, but it can also

cause slow convergence or local minima. By decreasing the learning rate, you may not be able to

reach the optimal solution in a reasonable time, and your training process may take longer2.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: ML Models and

Architectures, Week 1: Introduction to ML Models and Architectures

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Architecting ML

solutions, 2.1 Designing ML models

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: ML

Models and Architectures, Section 4.1: Designing ML Models

Use TPUs

Triose phosphate utilization and beyond: from photosynthesis to end …

Cloud TPU performance guide

Google TPU: Architecture and Performance Best Practices - Run

Q: 19

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?

Options

Correct Answer:

Explanation

Option A is correct because using local feature importance from the predictions is the best way to

provide the reasons that contributed to the model’s decision for a specific customer’s loan

request. Local feature importance is a measure of how much each feature affects the prediction for a

given instance, relative to the average prediction for the dataset1. AutoML Tables provides local

feature importance values for each prediction, which can be accessed using the Vertex AI SDK for

Python or the Cloud Console2. By using local feature importance, you can explain why the model

rejected the loan request based on the customer’s data.

Option B is incorrect because using the correlation with target values in the data summary page is

not a good way to provide the reasons that contributed to the model’s decision for a specific

customer’s loan request. The correlation with target values is a measure of how much each feature is

linearly related to the target variable for the entire dataset, not for a single instance3. The data

summary page in AutoML Tables shows the correlation with target values for each feature, as well as

other statistics such as mean, standard deviation, and histogram4. However, these statistics are not

useful for explaining the model’s decision for a specific customer, as they do not account for the

interactions between features or the non-linearity of the model.

Option C is incorrect because using the feature importance percentages in the model evaluation

page is not a good way to provide the reasons that contributed to the model’s decision for a specific

customer’s loan request. The feature importance percentages are a measure of how much each

feature affects the overall accuracy of the model for the entire dataset, not for a single instance5. The

model evaluation page in AutoML Tables shows the feature importance percentages for each feature,

as well as other metrics such as precision, recall, and confusion matrix. However, these metrics are

not useful for explaining the model’s decision for a specific customer, as they do not reflect the

individual contribution of each feature for a given prediction.

Option D is incorrect because varying features independently to identify the threshold per feature

that changes the classification is not a feasible way to provide the reasons that contributed to the

model’s decision for a specific customer’s loan request. This method involves changing the value of

one feature at a time, while keeping the other features constant, and observing how the prediction

changes. However, this method is not practical, as it requires making multiple prediction requests,

and may not capture the interactions between features or the non-linearity of the model.

Reference:

Local feature importance

Getting local feature importance values

Correlation with target values

Data summary page

Feature importance percentages

[Model evaluation page]

[Varying features independently]

Q: 20

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?

Options

Question 11 of 20 · Page 2 / 2

Premium Access Includes

✓ Quiz Simulator
✓ Exam Mode
✓ Progress Tracking
✓ Question Saving
✓ Flash Cards
✓ Drag & Drops
✓ 3 Months Access
✓ PDF Downloads

Get Premium Access

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE