Question 17 - Google Professional-Machine-Learning-Engineer Real Exam Questions [March 2026 Update]

Q: 17

You have created a Vertex Al pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket The second step uses the processed data to train a model You need to update the model's code to allow you to test different algorithms You want to reduce pipeline execution time and cost, while also minimizing pipeline changes What should you do?

Options

Discussion

Owen Feb 22, 2026 12:08 pm

Not sure about that, pretty sure A is better here. If you add a pipeline parameter and step to decide whether to preprocess or not, you can skip preprocessing if data hasn't changed and just go straight to training. That way, you're not redoing heavy ETL work every time, and the change is minimal. But open to other ideas if I'm missing something obvious?

Casey K. Mar 6, 2026 8:26 am

Caching on Vertex AI lets you reuse the output from the preprocessing step, so you only rerun it if the inputs or code change. That way, when testing different algorithms in training, you skip the big ETL re-do and save both time and compute cost. Pretty sure D makes the most sense here since it keeps pipeline changes minimal too. Feel free to point out if I'm missing a catch.

Riley T. Feb 23, 2026 1:19 pm

MethodicalDev8419 Feb 14, 2026 11:05 pm

Layla B. Feb 15, 2026 6:59 am

Is there a reason not to just enable pipeline caching for the preprocessing step and keep training uncached? That seems fastest and is suggested by most exam resources I've seen. Official guide talks about this approach.

Casey C. Feb 15, 2026 5:37 pm

I'd pick C here. More CPU and RAM for the preprocessing step should speed things up without changing the pipeline much, I think. Disagree?

Aaron G. Mar 4, 2026 12:34 pm

D (I'm a bit confused but caching seems to save rerunning preprocessing, which cuts cost and keeps things simple?)

Be respectful. No spam.

Correct Answer:

Explanation

The best option for reducing pipeline execution time and cost, while also minimizing pipeline

changes, is to enable caching for the pipeline job, and disable caching for the model training step.

This option allows you to leverage the power and simplicity of Vertex AI Pipelines to reuse the output

of the data preprocessing step, and avoid unnecessary recomputation. Vertex AI Pipelines is a service

that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run

preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the

machine learning model. Caching is a feature of Vertex AI Pipelines that can store and reuse the

output of a pipeline step, and skip the execution of the step if the input parameters and the code

have not changed. Caching can help you reduce the pipeline execution time and cost, as you do not

need to re-run the same step with the same input and code. Caching can also help you minimize the

pipeline changes, as you do not need to add or remove any pipeline steps or parameters. By enabling

caching for the pipeline job, and disabling caching for the model training step, you can create a

Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data, completes in about

1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to

train a model. You can update the model’s code to allow you to test different algorithms, and run the

pipeline job with caching enabled. The pipeline job will reuse the output of the data preprocessing

step from the cache, and skip the execution of the step. The pipeline job will run the model training

step with the updated code, and disable the caching for the step. This way, you can reduce the

pipeline execution time and cost, while also minimizing pipeline changes1.

The other options are not as good as option D, for the following reasons:

Option A: Adding a pipeline parameter and an additional pipeline step, depending on the parameter

value, the pipeline step conducts or skips data preprocessing and starts model training, would

require more skills and steps than enabling caching for the pipeline job, and disabling caching for the

model training step. A pipeline parameter is a variable that can be used to control the input or

output of a pipeline step. A pipeline parameter can help you customize the pipeline logic and

behavior, and experiment with different values. An additional pipeline step is a new instance of a

pipeline component that can perform a part of the pipeline workflow, such as data preprocessing or

model training. An additional pipeline step can help you extend the pipeline functionality and

complexity, and handle different scenarios. However, adding a pipeline parameter and an additional

pipeline step, depending on the parameter value, the pipeline step conducts or skips data

preprocessing and starts model training, would require more skills and steps than enabling caching

for the pipeline job, and disabling caching for the model training step. You would need to write code,

define the pipeline parameter, create the additional pipeline step, implement the conditional logic,

and compile and run the pipeline. Moreover, this option would not reuse the output of the data

preprocessing step from the cache, but rather from the Cloud Storage bucket, which can increase the

data transfer and access costs1.

Option B: Creating another pipeline without the preprocessing step, and hardcoding the

preprocessed Cloud Storage file location for model training, would require more skills and steps than

enabling caching for the pipeline job, and disabling caching for the model training step. A pipeline

without the preprocessing step is a pipeline that only includes the model training step, and uses the

preprocessed data from the Cloud Storage bucket as the input. A pipeline without the preprocessing

step can help you avoid running the data preprocessing step every time, and reduce the pipeline

execution time and cost. However, creating another pipeline without the preprocessing step, and

hardcoding the preprocessed Cloud Storage file location for model training, would require more skills

and steps than enabling caching for the pipeline job, and disabling caching for the model training

step. You would need to write code, create a new pipeline, remove the preprocessing step, hardcode

the Cloud Storage file location, and compile and run the pipeline. Moreover, this option would not

reuse the output of the data preprocessing step from the cache, but rather from the Cloud Storage

bucket, which can increase the data transfer and access costs. Furthermore, this option would create

another pipeline, which can increase the maintenance and management costs1.

Option C: Configuring a machine with more CPU and RAM from the compute-optimized machine

family for the data preprocessing step, would not reduce the pipeline execution time and cost, while

also minimizing pipeline changes, but rather increase the pipeline execution cost and complexity. A

machine with more CPU and RAM from the compute-optimized machine family is a virtual machine

that has a high ratio of CPU cores to memory, and can provide high performance and scalability for

compute-intensive workloads. A machine with more CPU and RAM from the compute-optimized

machine family can help you optimize the data preprocessing step, and reduce the pipeline execution

time. However, configuring a machine with more CPU and RAM from the compute-optimized

machine family for the data preprocessing step, would not reduce the pipeline execution time and

cost, while also minimizing pipeline changes, but rather increase the pipeline execution cost and

complexity. You would need to write code, configure the machine type parameters for the data

preprocessing step, and compile and run the pipeline. Moreover, this option would increase the

pipeline execution cost, as machines with more CPU and RAM from the compute-optimized machine

family are more expensive than machines with less CPU and RAM from other machine

families. Furthermore, this option would not reuse the output of the data preprocessing step from

the cache, but rather re-run the data preprocessing step every time, which can increase the pipeline

execution time and cost1.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML

Systems, Week 3: MLOps

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in

production, 3.2 Automating ML workflows

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:

Production ML Systems, Section 6.4: Automating ML Workflows

Vertex AI Pipelines

Caching

Pipeline parameters

Machine types

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE