Free Professional-Machine-Learning-Engineer Practice Questions

Get Professional Machine-Learning-Engineer Exam Questions

Google Machine Learning Engineer

Q: 1

You work as an analyst at a large banking firm. You are developing a robust, scalable ML pipeline to train several regression and classification models. Your primary focus for the pipeline is model interpretability. You want to productionize the pipeline as quickly as possible What should you do?

Options

Q: 2

You work for a bank You have been asked to develop an ML model that will support loan application decisions. You need to determine which Vertex Al services to include in the workflow You want to track the model's training parameters and the metrics per training epoch. You plan to compare the performance of each version of the model to determine the best model based on your chosen metrics. Which Vertex Al services should you use?

Options

Q: 3

You developed a custom model by using Vertex Al to forecast the sales of your company s products based on historical transactional data You anticipate changes in the feature distributions and the correlations between the features in the near future You also expect to receive a large volume of prediction requests You plan to use Vertex Al Model Monitoring for drift detection and you want to minimize the cost. What should you do?

Options

Correct Answer:

Explanation

The best option for using Vertex AI Model Monitoring for drift detection and minimizing the cost is to

use the features and the feature attributions for monitoring, and set a prediction-sampling-rate value

that is closer to 0 than 1. This option allows you to leverage the power and flexibility of Google Cloud

to detect feature drift in the input predict requests for custom models, and reduce the storage and

computation costs of the model monitoring job. Vertex AI Model Monitoring is a service that can

track and compare the results of multiple machine learning runs. Vertex AI Model Monitoring can

monitor the model’s prediction input data for feature skew and drift. Feature drift occurs when the

feature data distribution in production changes over time. If the original training data is not available,

you can enable drift detection to monitor your models for feature drift. Vertex AI Model Monitoring

uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each

feature, and compares them with a baseline distribution. The baseline distribution is the statistical

distribution of the feature’s values in the training data. If the training data is not available, the

baseline distribution is calculated from the first 1000 prediction requests that the model receives. If

the distance score for a feature exceeds an alerting threshold that you set, Vertex AI Model

Monitoring sends you an email alert. However, if you use a custom model, you can also enable

feature attribution monitoring, which can provide more insights into the feature drift. Feature

attribution monitoring analyzes the feature attributions, which are the contributions of each feature

to the prediction output. Feature attribution monitoring can help you identify the features that have

the most impact on the model performance, and the features that have the most significant drift

over time. Feature attribution monitoring can also help you understand the relationship between the

features and the prediction output, and the correlation between the features1. The prediction-

sampling-rate is a parameter that determines the percentage of prediction requests that are logged

and analyzed by the model monitoring job. Using a lower prediction-sampling-rate can reduce the

storage and computation costs of the model monitoring job, but also the quality and validity of the

data. Using a lower prediction-sampling-rate can introduce sampling bias and noise into the data,

and make the model monitoring job miss some important features or patterns of the data. However,

using a higher prediction-sampling-rate can increase the storage and computation costs of the model

monitoring job, and also the amount of data that needs to be processed and analyzed. Therefore,

there is a trade-off between the prediction-sampling-rate and the cost and accuracy of the model

monitoring job, and the optimal prediction-sampling-rate depends on the business objective and the

data characteristics2. By using the features and the feature attributions for monitoring, and setting a

prediction-sampling-rate value that is closer to 0 than 1, you can use Vertex AI Model Monitoring for

drift detection and minimize the cost.

The other options are not as good as option D, for the following reasons:

Option A: Using the features for monitoring and setting a monitoring-frequency value that is higher

than the default would not enable feature attribution monitoring, and could increase the cost of the

model monitoring job. The monitoring-frequency is a parameter that determines how often the

model monitoring job analyzes the logged prediction requests and calculates the distributions and

distance scores for each feature. Using a higher monitoring-frequency can increase the frequency

and timeliness of the model monitoring job, but also the computation costs of the model monitoring

job. Moreover, using the features for monitoring would not enable feature attribution monitoring,

which can provide more insights into the feature drift and the model performance1.

Option B: Using the features for monitoring and setting a prediction-sampling-rate value that is

closer to 1 than 0 would not enable feature attribution monitoring, and could increase the cost of the

model monitoring job. The prediction-sampling-rate is a parameter that determines the percentage

of prediction requests that are logged and analyzed by the model monitoring job. Using a higher

prediction-sampling-rate can increase the quality and validity of the data, but also the storage and

computation costs of the model monitoring job. Moreover, using the features for monitoring would

not enable feature attribution monitoring, which can provide more insights into the feature drift and

the model performance12.

Option C: Using the features and the feature attributions for monitoring and setting a monitoring-

frequency value that is lower than the default would enable feature attribution monitoring, but

could reduce the frequency and timeliness of the model monitoring job. The monitoring-frequency is

a parameter that determines how often the model monitoring job analyzes the logged prediction

requests and calculates the distributions and distance scores for each feature. Using a lower

monitoring-frequency can reduce the computation costs of the model monitoring job, but also the

frequency and timeliness of the model monitoring job. This can make the model monitoring job less

responsive and effective in detecting and alerting the feature drift1.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML

Systems, Week 4: Evaluation

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in

production, 3.3 Monitoring ML models in production

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:

Production ML Systems, Section 6.3: Monitoring ML Models

Using Model Monitoring

Understanding the score threshold slider

Q: 4

You work for a pet food company that manages an online forum Customers upload photos of their pets on the forum to share with others About 20 photos are uploaded daily You want to automatically and in near real time detect whether each uploaded photo has an animal You want to prioritize time and minimize cost of your application development and deployment What should you do?

Options

Correct Answer:

Explanation

Cloud Vision API is a service that allows you to analyze images using pre-trained machine learning

models1. You can use Cloud Vision API to perform various tasks, such as face detection, text

extraction, logo recognition, and object localization1. Object localization is a feature that allows you

to detect multiple objects in an image and draw bounding boxes around them2. You can also get the

labels and confidence scores for each detected object2.

By sending user-submitted images to the Cloud Vision API, you can use object localization to identify

all objects in the image and compare the results against a list of animals. You can use

the OBJECT_LOCALIZATION feature type in the AnnotateImageRequest to request object

localization3. You can then use the localizedObjectAnnotations field in

the AnnotateImageResponse to get the list of detected objects, their labels, and their confidence

scores. You can compare the labels with a predefined list of animals, such as dogs, cats, birds, etc.,

and determine whether the image has an animal or not.

This option is the best for your scenario, because it allows you to automatically and in near real time

detect whether each uploaded photo has an animal, without requiring any manual labeling, model

training, or model deployment. You can also prioritize time and minimize cost of your application

development and deployment, as you can use the Cloud Vision API as a ready-to-use service, without

needing any machine learning expertise or infrastructure.

The other options are not suitable for your scenario, because they either require manual labeling,

model training, or model deployment, which would increase the time and cost of your application

development and deployment, or they use object detection models, which are more complex and

computationally expensive than object localization models, and are not necessary for your simple

task of detecting whether an image has an animal or not.

Reference:

Cloud Vision API | Google Cloud

Object localization | Cloud Vision API | Google Cloud

AnnotateImageRequest | Cloud Vision API | Google Cloud

[AnnotateImageResponse | Cloud Vision API | Google Cloud]

Q: 5

You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

Options

Correct Answer:

Explanation

Option A is incorrect because distributing the dataset with

tf.distribute.Strategy.experimental_distribute_dataset is not the most effective way to decrease the

training time. This method allows you to distribute your dataset across multiple devices or machines,

by creating a tf.data.Dataset instance that can be iterated over in parallel1. However, this option may

not improve the training time significantly, as it does not change the amount of data or computation

that each device or machine has to process. Moreover, this option may introduce additional

overhead or complexity, as it requires you to handle the data sharding, replication, and

synchronization across the devices or machines1.

Option B is incorrect because creating a custom training loop is not the easiest way to decrease the

training time. A custom training loop is a way to implement your own logic for training your model,

by using low-level TensorFlow APIs, such as tf.GradientTape, tf.Variable, or tf.function2. A custom

training loop may give you more flexibility and control over the training process, but it also requires

more effort and expertise, as you have to write and debug the code for each step of the training loop,

such as computing the gradients, applying the optimizer, or updating the metrics2. Moreover, a

custom training loop may not improve the training time significantly, as it does not change the

amount of data or computation that each device or machine has to process.

Option C is incorrect because using a TPU with tf.distribute.TPUStrategy is not a valid way to decrease

the training time. A TPU (Tensor Processing Unit) is a custom hardware accelerator designed for high-

performance ML workloads3. A tf.distribute.TPUStrategy is a distribution strategy that allows you to

distribute your training across multiple TPUs, by creating a tf.distribute.TPUStrategy instance that can

be used with high-level TensorFlow APIs, such as Keras4. However, this option is not feasible, as

Vertex AI Training does not support TPUs as accelerators for custom training jobs5. Moreover, this

option may require significant code changes, as TPUs have different requirements and limitations

than GPUs.

Option D is correct because increasing the batch size is the best way to decrease the training time.

The batch size is a hyperparameter that determines how many samples of data are processed in each

iteration of the training loop. Increasing the batch size may reduce the training time, as it reduces the

number of iterations needed to train the model, and it allows each device or machine to process

more data in parallel. Increasing the batch size is also easy to implement, as it only requires changing

a single hyperparameter. However, increasing the batch size may also affect the convergence and the

accuracy of the model, so it is important to find the optimal batch size that balances the trade-off

between the training time and the model performance.

Reference:

tf.distribute.Strategy.experimental_distribute_dataset

Custom training loop

TPU overview

tf.distribute.TPUStrategy

Vertex AI Training accelerators

[TPU programming model]

[Batch size and learning rate]

[Keras overview]

[tf.distribute.MirroredStrategy]

[Vertex AI Training overview]

[TensorFlow overview]

Q: 6

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine. You use the following parameters: • Optimizer: SGD • Image shape 224x224 • Batch size 64 • Epochs 10 • Verbose 2 During training you encounter the following error: ResourceExhaustedError: out of Memory (oom) when allocating tensor. What should you do?

Options

Correct Answer:

Explanation

A ResourceExhaustedError: out of memory (OOM) when allocating tensor is an error that occurs

when the GPU runs out of memory while trying to allocate memory for a tensor. A tensor is a multi-

dimensional array of numbers that represents the data or the parameters of a machine learning

model. The size and shape of a tensor depend on various factors, such as the input data, the model

architecture, the batch size, and the optimization algorithm1.

For the use case of training a computer vision model that predicts the type of government ID present

in a given image using a GPU-powered virtual machine on Compute Engine, the best option to

resolve the error is to reduce the batch size. The batch size is a parameter that determines how many

input examples are processed at a time by the model. A larger batch size can improve the model’s

accuracy and stability, but it also requires more memory and computation. A smaller batch size can

reduce the memory and computation requirements, but it may also affect the model’s performance

and convergence2.

By reducing the batch size, the GPU can allocate less memory for each tensor, and avoid running out

of memory. Reducing the batch size can also speed up the training process, as the GPU can process

more batches in parallel. However, reducing the batch size too much may also have some drawbacks,

such as increasing the noise and variance of the gradient updates, and slowing down the

convergence of the model. Therefore, the optimal batch size should be chosen based on the trade-off

between memory, computation, and performance3.

The other options are not as effective as option B, because they are not directly related to the

memory allocation of the GPU. Option A, changing the optimizer, may affect the speed and quality of

the optimization process, but it may not reduce the memory usage of the model. Option C, changing

the learning rate, may affect the convergence and stability of the model, but it may not reduce the

memory usage of the model. Option D, reducing the image shape, may reduce the size of the input

tensor, but it may also reduce the quality and resolution of the image, and affect the model’s

accuracy. Therefore, option B, reducing the batch size, is the best answer for this question.

Reference:

ResourceExhaustedError: OOM when allocating tensor with shape - Stack Overflow

How does batch size affect model performance and training time? - Stack Overflow

How to choose an optimal batch size for training a neural network? - Stack Overflow

Q: 7

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

Options

Correct Answer:

Explanation

Option A is incorrect because Vertex AI Pipelines and App Engine do not meet all the requirements of

the system. Vertex AI Pipelines is a service that allows you to create, run, and manage ML workflows

using TensorFlow Extended (TFX) components or custom components1. App Engine is a service that

allows you to build and deploy scalable web applications using standard or flexible

environments2. However, App Engine does not support Docker containers in the standard

environment, and does not provide a dedicated service for online prediction and monitoring of ML

models3.

Option B is correct because Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring

meet all the requirements of the system. Vertex AI Prediction is a service that allows you to deploy

and serve ML models for online or batch prediction, with support for autoscaling and custom

containers4. Vertex AI Model Monitoring is a service that allows you to monitor the performance and

fairness of your deployed models, and get alerts for any issues or anomalies5.

Option C is incorrect because Cloud Composer, BigQuery ML, and Vertex AI Prediction do not meet

all the requirements of the system. Cloud Composer is a service that allows you to create, schedule,

and manage workflows using Apache Airflow. BigQuery ML is a service that allows you to create and

use ML models within BigQuery using SQL queries. However, BigQuery ML does not support custom

containers, and Vertex AI Prediction does not support scheduled model retraining or model

monitoring.

Option D is incorrect because Cloud Composer, Vertex AI Training with custom containers, and App

Engine do not meet all the requirements of the system. Vertex AI Training is a service that allows you

to train ML models using built-in algorithms or custom containers. However, Vertex AI Training does

not support online prediction or model monitoring, and App Engine does not support Docker

containers in the standard environment or online prediction and monitoring of ML models3.

Reference:

Vertex AI Pipelines overview

App Engine overview

Choosing an App Engine environment

Vertex AI Prediction overview

Vertex AI Model Monitoring overview

[Cloud Composer overview]

[BigQuery ML overview]

[BigQuery ML limitations]

[Vertex AI Training overview]

Q: 8

Your team has a model deployed to a Vertex Al endpoint You have created a Vertex Al pipeline that automates the model training process and is triggered by a Cloud Function. You need to prioritize keeping the model up-to-date, but also minimize retraining costs. How should you configure retraining'?

Options

Q: 9

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

Options

Correct Answer:

Explanation

Kubeflow is an open source platform for developing, orchestrating, deploying, and running scalable

and portable machine learning workflows on Kubernetes. Kubeflow Pipelines is a component of

Kubeflow that allows you to build and manage end-to-end machine learning pipelines using a

graphical user interface or a Python-based domain-specific language (DSL). Kubeflow Pipelines can

help you automate and orchestrate your machine learning workflows, and integrate with various

Google Cloud services and tools1

One of the Google Cloud services that you can use with Kubeflow Pipelines is BigQuery, which is a

serverless, scalable, and cost-effective data warehouse that allows you to run fast and complex

queries on large-scale data. BigQuery can help you analyze and prepare your data for machine

learning, and store and manage your machine learning models2

To execute a query against BigQuery as the first step in your Kubeflow pipeline, and use the results of

that query as the input to the next step in your pipeline, the easiest way to do that is to use the

BigQuery Query Component, which is a pre-built component that you can find in the Kubeflow

Pipelines repository on GitHub. The BigQuery Query Component allows you to run a SQL query on

BigQuery, and output the results as a table or a file. You can use the component’s URL to load the

component into your pipeline, and specify the query and the output parameters. You can then use

the output of the component as the input to the next step in your pipeline, such as a data processing

or a model training step3

The other options are not as easy or feasible. Using the BigQuery console to execute your query and

then save the query results into a new BigQuery table is not a good idea, as it does not integrate with

your Kubeflow pipeline, and requires manual intervention and duplication of data. Writing a Python

script that uses the BigQuery API to execute queries against BigQuery is not ideal, as it requires

writing custom code and handling authentication and error handling. Using the Kubeflow Pipelines

DSL to create a custom component that uses the Python BigQuery client library to execute queries is

not optimal, as it requires creating and packaging a Docker container image for the component, and

testing and debugging the component.

Reference: 1: Kubeflow Pipelines overview 2: BigQuery overview 3: BigQuery Query Component

Q: 10

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user- managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries: CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.8); CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS (SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.2); After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Options

Q: 11

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex Explainable AI. What should you do?

Options

Correct Answer:

Explanation

Option A is incorrect because training local surrogate models to explain individual predictions is not a

feature of Vertex Explainable AI, but rather a general technique for interpreting black-box

models. Local surrogate models are simpler models that approximate the behavior of the original

model around a specific input1.

Option B is correct because configuring sampled Shapley explanations on Vertex Explainable AI is a

way to explain the difference between the actual prediction and the average prediction for a given

input. Sampled Shapley explanations are based on the Shapley value, which is a game-theoretic

concept that measures how much each feature contributes to the prediction2. Vertex Explainable AI

supports sampled Shapley explanations for tabular data, such as customer churn3.

Option C is incorrect because configuring integrated gradients explanations on Vertex Explainable AI

is not suitable for explaining the difference between the actual prediction and the average prediction

for a given input. Integrated gradients explanations are based on the idea of computing the gradients

of the prediction with respect to the input features along a path from a baseline input to the actual

input4. Vertex Explainable AI supports integrated gradients explanations for image and text data, but

not for tabular data3.

Option D is incorrect because measuring the effect of each feature as the weight of the feature

multiplied by the feature value is not a valid way to explain the difference between the actual

prediction and the average prediction for a given input. This method assumes that the model is linear

and additive, which is not the case for an ensemble of trees and neural networks. Moreover, this

method does not account for the interactions between features or the non-linearity of the model5.

Reference:

Local surrogate models

Shapley value

Vertex Explainable AI overview

Integrated gradients

Feature importance

Q: 12

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

Options

Correct Answer:

Explanation

Developing ML models with AI Platform for image segmentation on CT scans requires a lot of

computation and experimentation, as image segmentation is a complex and challenging task that

involves assigning a label to each pixel in an image. Image segmentation can be used for various

medical applications, such as tumor detection, organ segmentation, or lesion localization1

To minimize the computation costs and manual intervention while having version control for the

code, one should use Cloud Build linked with Cloud Source Repositories to trigger retraining when

new code is pushed to the repository. Cloud Build is a service that executes your builds on Google

Cloud Platform infrastructure. Cloud Build can import source code from Cloud Source Repositories,

Cloud Storage, GitHub, or Bitbucket, execute a build to your specifications, and produce artifacts

such as Docker containers or Java archives2

Cloud Build allows you to set up automated triggers that start a build when changes are pushed to a

source code repository. You can configure triggers to filter the changes based on the branch, tag, or

file path3

Cloud Source Repositories is a service that provides fully managed private Git repositories on Google

Cloud Platform. Cloud Source Repositories allows you to store, manage, and track your code using

the Git version control system. You can also use Cloud Source Repositories to connect to other

Google Cloud services, such as Cloud Build, Cloud Functions, or Cloud Run4

To use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is

pushed to the repository, you need to do the following steps:

Create a Cloud Source Repository for your code, and push your code to the repository. You can use

the Cloud SDK, Cloud Console, or Cloud Source Repositories API to create and manage your

repository5

Create a Cloud Build trigger for your repository, and specify the build configuration and the trigger

settings. You can use the Cloud SDK, Cloud Console, or Cloud Build API to create and manage your

trigger.

Specify the steps of the build in a YAML or JSON file, such as installing the dependencies, running the

tests, building the container image, and submitting the training job to AI Platform. You can also use

the Cloud Build predefined or custom build steps to simplify your build configuration.

Push your new code to the repository, and the trigger will start the build automatically. You can

monitor the status and logs of the build using the Cloud SDK, Cloud Console, or Cloud Build API.

The other options are not as easy or feasible. Using Cloud Functions to identify changes to your code

in Cloud Storage and trigger a retraining job is not ideal, as Cloud Functions has limitations on the

memory, CPU, and execution time, and does not provide a user interface for managing and tracking

your builds. Using the gcloud command-line tool to submit training jobs on AI Platform when you

update your code is not optimal, as it requires manual intervention and does not leverage the

benefits of Cloud Build and its integration with Cloud Source Repositories. Creating an automated

workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a

sensor is not relevant, as Cloud Composer is mainly designed for orchestrating complex workflows

across multiple systems, and does not provide a version control system for your code.

Reference: 1: Image segmentation 2: Cloud Build overview 3: Creating and managing build

triggers 4: Cloud Source Repositories overview 5: Quickstart: Create a repository : [Quickstart: Create

a build trigger] : [Configuring builds] : [Viewing build results]

Q: 13

You work for the AI team of an automobile company, and you are developing a visual defect detection model using TensorFlow and Keras. To improve your model performance, you want to incorporate some image augmentation functions such as translation, cropping, and contrast tweaking. You randomly apply these functions to each training batch. You want to optimize your data processing pipeline for run time and compute resources utilization. What should you do?

Options

Correct Answer:

Explanation

The best option for optimizing the data processing pipeline for run time and compute resources

utilization is to embed the augmentation functions dynamically in the tf.Data pipeline. This option

has the following advantages:

It allows the data augmentation to be performed on the fly, without creating or storing additional

copies of the data. This saves storage space and reduces the data transfer time.

It leverages the parallelism and performance of the tf.Data API, which can efficiently apply the

augmentation functions to multiple batches of data in parallel, using multiple CPU cores or GPU

devices. The tf.Data API also supports various optimization techniques, such as caching, prefetching,

and autotuning, to improve the data processing speed and reduce the latency.

It integrates seamlessly with the TensorFlow and Keras models, which can consume the tf.Data

datasets as inputs for training and evaluation. The tf.Data API also supports various data formats,

such as images, text, audio, and video, and various data sources, such as files, databases, and web

services.

The other options are less optimal for the following reasons:

Option B: Embedding the augmentation functions dynamically as part of Keras generators introduces

some limitations and overhead. Keras generators are Python generators that yield batches of data for

training or evaluation. However, Keras generators are not compatible with the tf.distribute API,

which is used to distribute the training across multiple devices or machines. Moreover, Keras

generators are not as efficient or scalable as the tf.Data API, as they run on a single Python thread

and do not support parallelism or optimization techniques.

Option C: Using Dataflow to create all possible augmentations, and store them as TFRecords

introduces additional complexity and cost. Dataflow is a fully managed service that runs Apache

Beam pipelines for data processing and transformation. However, using Dataflow to create all

possible augmentations requires generating and storing a large number of augmented images, which

can consume a lot of storage space and incur storage and network costs. Moreover, using Dataflow to

create the augmentations requires writing and deploying a separate Dataflow pipeline, which can be

tedious and time-consuming.

Option D: Using Dataflow to create the augmentations dynamically per training run, and stage them

as TFRecords introduces additional complexity and latency. Dataflow is a fully managed service that

runs Apache Beam pipelines for data processing and transformation. However, using Dataflow to

create the augmentations dynamically per training run requires running a Dataflow pipeline every

time the model is trained, which can introduce latency and delay the training process. Moreover,

using Dataflow to create the augmentations requires writing and deploying a separate Dataflow

pipeline, which can be tedious and time-consuming.

Reference:

[tf.data: Build TensorFlow input pipelines]

[Image augmentation | TensorFlow Core]

[Dataflow documentation]

Q: 14

You recently deployed a scikit-learn model to a Vertex Al endpoint You are now testing the model on live production traffic While monitoring the endpoint. you discover twice as many requests per hour than expected throughout the day You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency What should you do?

Options

Correct Answer:

Explanation

The best option for scaling a Vertex AI endpoint efficiently when the demand increases in the future,

using a scikit-learn model that is deployed to a Vertex AI endpoint and tested on live production

traffic, is to configure an appropriate minReplicaCount value based on expected baseline traffic. This

option allows you to leverage the power and simplicity of Vertex AI to automatically scale your

endpoint resources according to the traffic patterns. Vertex AI is a unified platform for building and

deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an

online prediction endpoint, which can provide low-latency predictions for individual instances.

Vertex AI can also provide various tools and services for data analysis, model development, model

deployment, model monitoring, and model governance. A minReplicaCount value is a parameter

that specifies the minimum number of replicas that the endpoint must always have, regardless of the

load. A minReplicaCount value can help you ensure that the endpoint has enough resources to

handle the expected baseline traffic, and avoid high latency or errors. By configuring an appropriate

minReplicaCount value based on expected baseline traffic, you can scale your endpoint efficiently

when the demand increases in the future. You can set the minReplicaCount value when you deploy

the model to the endpoint, or update it later. Vertex AI will automatically scale up or down the

number of replicas within the range of the minReplicaCount and maxReplicaCount values, based on

the target utilization percentage and the autoscaling metric1.

The other options are not as good as option B, for the following reasons:

Option A: Deploying two models to the same endpoint and distributing requests among them evenly

would not allow you to scale your endpoint efficiently when the demand increases in the future, and

could increase the complexity and cost of the deployment process. A model is a resource that

represents a machine learning model that you can use for prediction. A model can have one or more

versions, which are different implementations of the same model. A model version can help you

experiment and iterate on your model, and improve the model performance and accuracy. An

endpoint is a resource that provides the service endpoint (URL) you use to request the prediction. An

endpoint can have one or more deployed models, which are instances of model versions that are

associated with physical resources. A deployed model can help you serve online predictions with low

latency, and scale up or down based on the traffic. By deploying two models to the same endpoint

and distributing requests among them evenly, you can create a load balancing mechanism that can

distribute the traffic across the models, and reduce the load on each model. However, deploying two

models to the same endpoint and distributing requests among them evenly would not allow you to

scale your endpoint efficiently when the demand increases in the future, and could increase the

complexity and cost of the deployment process. You would need to write code, create and configure

the two models, deploy the models to the same endpoint, and distribute the requests among them

evenly. Moreover, this option would not use the autoscaling feature of Vertex AI, which can

automatically adjust the number of replicas based on the traffic patterns, and provide various

benefits, such as optimal resource utilization, cost savings, and performance improvement2.

Option C: Setting the target utilization percentage in the autoscalingMetricSpecs configuration to a

higher value would not allow you to scale your endpoint efficiently when the demand increases in

the future, and could cause errors or poor performance. A target utilization percentage is a

parameter that specifies the desired utilization level of each replica. A target utilization percentage

can affect the speed and accuracy of the autoscaling process. A higher target utilization percentage

can help you reduce the number of replicas, but it can also cause high latency, low throughput, or

resource exhaustion. By setting the target utilization percentage in the autoscalingMetricSpecs

configuration to a higher value, you can increase the utilization level of each replica, and save some

resources. However, setting the target utilization percentage in the autoscalingMetricSpecs

configuration to a higher value would not allow you to scale your endpoint efficiently when the

demand increases in the future, and could cause errors or poor performance. You would need to

write code, create and configure the autoscalingMetricSpecs, and set the target utilization

percentage to a higher value. Moreover, this option would not ensure that the endpoint has enough

resources to handle the expected baseline traffic, which could cause high latency or errors1.

Option D: Changing the model’s machine type to one that utilizes GPUs would not allow you to scale

your endpoint efficiently when the demand increases in the future, and could increase the

complexity and cost of the deployment process. A machine type is a parameter that specifies the

type of virtual machine that the prediction service uses for the deployed model. A machine type can

affect the speed and accuracy of the prediction process. A machine type that utilizes GPUs can help

you accelerate the computation and processing of the prediction, and handle more prediction

requests at the same time. By changing the model’s machine type to one that utilizes GPUs, you can

improve the prediction performance and efficiency of your model. However, changing the model’s

machine type to one that utilizes GPUs would not allow you to scale your endpoint efficiently when

the demand increases in the future, and could increase the complexity and cost of the deployment

process. You would need to write code, create and configure the model, deploy the model to the

endpoint, and change the machine type to one that utilizes GPUs. Moreover, this option would not

use the autoscaling feature of Vertex AI, which can automatically adjust the number of replicas based

on the traffic patterns, and provide various benefits, such as optimal resource utilization, cost

savings, and performance improvement2.

Reference:

Configure compute resources for prediction | Vertex AI | Google Cloud

Deploy a model to an endpoint | Vertex AI | Google Cloud

Q: 15

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

Options

Q: 16

You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage Bucket. What should you do?

Options

Correct Answer:

Explanation

The best option for developing an image classification model by using a large dataset that contains

labeled images in a Cloud Storage bucket is to import the labeled images as a managed dataset in

Vertex AI and use AutoML to train the model. This option allows you to leverage the power and

simplicity of Google Cloud to create and deploy a high-quality image classification model with

minimal code and configuration. Vertex AI is a unified platform for building and deploying machine

learning solutions on Google Cloud. Vertex AI can create a managed dataset from a Cloud Storage

bucket that contains labeled images, which can be used to train an AutoML model. AutoML is a

service that can automatically build and optimize machine learning models for various tasks, such as

image classification, object detection, natural language processing, and tabular data analysis.

AutoML can handle the complex aspects of machine learning, such as feature engineering, model

architecture, hyperparameter tuning, and model evaluation. AutoML can also evaluate, deploy, and

monitor the image classification model, and provide online or batch predictions. By using Vertex AI

and AutoML, users can develop an image classification model by using a large dataset with ease and

efficiency.

The other options are not as good as option C, for the following reasons:

Option A: Using Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. Vertex AI Pipelines is a service that can orchestrate machine learning

workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom

Docker images, and evaluate, deploy, and monitor the machine learning model. Kubeflow Pipelines

SDK is a Python library that can create and run pipelines on Vertex AI Pipelines or on Kubeflow, an

open-source platform for machine learning on Kubernetes. However, using Vertex AI Pipelines and

Kubeflow Pipelines SDK would require writing code, building Docker images, defining pipeline

components and steps, and managing the pipeline execution and artifacts. Moreover, Vertex AI

Pipelines and Kubeflow Pipelines SDK are not specialized for image classification, and users would

need to use other libraries or frameworks, such as TensorFlow or PyTorch, to build and train the

image classification model.

Option B: Using Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. TensorFlow Extended (TFX) is a framework that can create and run end-to-end

machine learning pipelines on TensorFlow, a popular library for building and training deep learning

models. TFX can preprocess the data, train and evaluate the model, validate and push the model,

and serve the model for online or batch predictions. However, using Vertex AI Pipelines and TFX

would require writing code, building Docker images, defining pipeline components and steps, and

managing the pipeline execution and artifacts. Moreover, TFX is not optimized for image

classification, and users would need to use other libraries or tools, such as TensorFlow Data

Validation, TensorFlow Transform, and TensorFlow Hub, to handle the image data and the model

architecture.

Option D: Converting the image dataset to a tabular format using Dataflow, loading the data into

BigQuery, and using BigQuery ML to train the model would not handle the image data properly and

could result in a poor model performance. Dataflow is a service that can create scalable and reliable

pipelines to process large volumes of data from various sources. Dataflow can preprocess the data by

using Apache Beam, a programming model for defining and executing data processing workflows.

BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast and

interactive queries on large datasets. BigQuery ML is a service that can create and train machine

learning models by using SQL queries on BigQuery. However, converting the image data to a tabular

format would lose the spatial and semantic information of the images, which are essential for image

classification. Moreover, BigQuery ML is not specialized for image classification, and users would

need to use other tools or techniques, such as feature hashing, embedding, or one-hot encoding, to

handle the categorical features.

Q: 17

You have created a Vertex Al pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket The second step uses the processed data to train a model You need to update the model's code to allow you to test different algorithms You want to reduce pipeline execution time and cost, while also minimizing pipeline changes What should you do?

Options

Correct Answer:

Explanation

The best option for reducing pipeline execution time and cost, while also minimizing pipeline

changes, is to enable caching for the pipeline job, and disable caching for the model training step.

This option allows you to leverage the power and simplicity of Vertex AI Pipelines to reuse the output

of the data preprocessing step, and avoid unnecessary recomputation. Vertex AI Pipelines is a service

that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run

preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the

machine learning model. Caching is a feature of Vertex AI Pipelines that can store and reuse the

output of a pipeline step, and skip the execution of the step if the input parameters and the code

have not changed. Caching can help you reduce the pipeline execution time and cost, as you do not

need to re-run the same step with the same input and code. Caching can also help you minimize the

pipeline changes, as you do not need to add or remove any pipeline steps or parameters. By enabling

caching for the pipeline job, and disabling caching for the model training step, you can create a

Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data, completes in about

1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to

train a model. You can update the model’s code to allow you to test different algorithms, and run the

pipeline job with caching enabled. The pipeline job will reuse the output of the data preprocessing

step from the cache, and skip the execution of the step. The pipeline job will run the model training

step with the updated code, and disable the caching for the step. This way, you can reduce the

pipeline execution time and cost, while also minimizing pipeline changes1.

The other options are not as good as option D, for the following reasons:

Option A: Adding a pipeline parameter and an additional pipeline step, depending on the parameter

value, the pipeline step conducts or skips data preprocessing and starts model training, would

require more skills and steps than enabling caching for the pipeline job, and disabling caching for the

model training step. A pipeline parameter is a variable that can be used to control the input or

output of a pipeline step. A pipeline parameter can help you customize the pipeline logic and

behavior, and experiment with different values. An additional pipeline step is a new instance of a

pipeline component that can perform a part of the pipeline workflow, such as data preprocessing or

model training. An additional pipeline step can help you extend the pipeline functionality and

complexity, and handle different scenarios. However, adding a pipeline parameter and an additional

pipeline step, depending on the parameter value, the pipeline step conducts or skips data

preprocessing and starts model training, would require more skills and steps than enabling caching

for the pipeline job, and disabling caching for the model training step. You would need to write code,

define the pipeline parameter, create the additional pipeline step, implement the conditional logic,

and compile and run the pipeline. Moreover, this option would not reuse the output of the data

preprocessing step from the cache, but rather from the Cloud Storage bucket, which can increase the

data transfer and access costs1.

Option B: Creating another pipeline without the preprocessing step, and hardcoding the

preprocessed Cloud Storage file location for model training, would require more skills and steps than

enabling caching for the pipeline job, and disabling caching for the model training step. A pipeline

without the preprocessing step is a pipeline that only includes the model training step, and uses the

preprocessed data from the Cloud Storage bucket as the input. A pipeline without the preprocessing

step can help you avoid running the data preprocessing step every time, and reduce the pipeline

execution time and cost. However, creating another pipeline without the preprocessing step, and

hardcoding the preprocessed Cloud Storage file location for model training, would require more skills

and steps than enabling caching for the pipeline job, and disabling caching for the model training

step. You would need to write code, create a new pipeline, remove the preprocessing step, hardcode

the Cloud Storage file location, and compile and run the pipeline. Moreover, this option would not

reuse the output of the data preprocessing step from the cache, but rather from the Cloud Storage

bucket, which can increase the data transfer and access costs. Furthermore, this option would create

another pipeline, which can increase the maintenance and management costs1.

Option C: Configuring a machine with more CPU and RAM from the compute-optimized machine

family for the data preprocessing step, would not reduce the pipeline execution time and cost, while

also minimizing pipeline changes, but rather increase the pipeline execution cost and complexity. A

machine with more CPU and RAM from the compute-optimized machine family is a virtual machine

that has a high ratio of CPU cores to memory, and can provide high performance and scalability for

compute-intensive workloads. A machine with more CPU and RAM from the compute-optimized

machine family can help you optimize the data preprocessing step, and reduce the pipeline execution

time. However, configuring a machine with more CPU and RAM from the compute-optimized

machine family for the data preprocessing step, would not reduce the pipeline execution time and

cost, while also minimizing pipeline changes, but rather increase the pipeline execution cost and

complexity. You would need to write code, configure the machine type parameters for the data

preprocessing step, and compile and run the pipeline. Moreover, this option would increase the

pipeline execution cost, as machines with more CPU and RAM from the compute-optimized machine

family are more expensive than machines with less CPU and RAM from other machine

families. Furthermore, this option would not reuse the output of the data preprocessing step from

the cache, but rather re-run the data preprocessing step every time, which can increase the pipeline

execution time and cost1.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML

Systems, Week 3: MLOps

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in

production, 3.2 Automating ML workflows

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:

Production ML Systems, Section 6.4: Automating ML Workflows

Vertex AI Pipelines

Caching

Pipeline parameters

Machine types

Q: 18

You are training an ML model on a large dataset. You are using a TPU to accelerate the training process You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?

Options

Correct Answer:

Explanation

The best option for training an ML model on a large dataset, using a TPU to accelerate the training

process, and discovering that the TPU is not reaching its full capacity, is to increase the batch size.

This option allows you to leverage the power and simplicity of TPUs to train your model faster and

more efficiently. A TPU is a custom-developed application-specific integrated circuit (ASIC) that can

accelerate machine learning workloads. A TPU can provide high performance and scalability for

various types of models, such as linear regression, logistic regression, k-means clustering, matrix

factorization, and deep neural networks. A TPU can also support various tools and frameworks, such

as TensorFlow, PyTorch, and JAX. A batch size is a parameter that specifies the number of training

examples in one forward/backward pass. A batch size can affect the speed and accuracy of the

training process. A larger batch size can help you utilize the parallel processing power of the TPU, and

reduce the communication overhead between the TPU and the host CPU. A larger batch size can also

help you avoid overfitting, as it can reduce the variance of the gradient updates. By increasing the

batch size, you can train your model on a large dataset faster and more efficiently, and make full use

of the TPU capacity1.

The other options are not as good as option D, for the following reasons:

Option A: Increasing the learning rate would not help you utilize the parallel processing power of the

TPU, and could cause errors or poor performance. A learning rate is a parameter that controls how

much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the

training process. A larger learning rate can help you converge faster, but it can also cause instability,

divergence, or oscillation. By increasing the learning rate, you may not be able to find the optimal

solution, and your model may perform poorly on the validation or test data2.

Option B: Increasing the number of epochs would not help you utilize the parallel processing power

of the TPU, and could increase the complexity and cost of the training process. An epoch is a measure

of the number of times all of the training examples are used once in the training process. An epoch

can affect the speed and accuracy of the training process. A larger number of epochs can help you

learn more from the data, but it can also cause overfitting, underfitting, or diminishing returns. By

increasing the number of epochs, you may not be able to improve the model performance

significantly, and your training process may take longer and consume more resources3.

Option C: Decreasing the learning rate would not help you utilize the parallel processing power of the

TPU, and could slow down the training process. A learning rate is a parameter that controls how

much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the

training process. A smaller learning rate can help you find a more precise solution, but it can also

cause slow convergence or local minima. By decreasing the learning rate, you may not be able to

reach the optimal solution in a reasonable time, and your training process may take longer2.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: ML Models and

Architectures, Week 1: Introduction to ML Models and Architectures

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Architecting ML

solutions, 2.1 Designing ML models

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: ML

Models and Architectures, Section 4.1: Designing ML Models

Use TPUs

Triose phosphate utilization and beyond: from photosynthesis to end …

Cloud TPU performance guide

Google TPU: Architecture and Performance Best Practices - Run

Q: 19

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?

Options

Correct Answer:

Explanation

Option A is correct because using local feature importance from the predictions is the best way to

provide the reasons that contributed to the model’s decision for a specific customer’s loan

request. Local feature importance is a measure of how much each feature affects the prediction for a

given instance, relative to the average prediction for the dataset1. AutoML Tables provides local

feature importance values for each prediction, which can be accessed using the Vertex AI SDK for

Python or the Cloud Console2. By using local feature importance, you can explain why the model

rejected the loan request based on the customer’s data.

Option B is incorrect because using the correlation with target values in the data summary page is

not a good way to provide the reasons that contributed to the model’s decision for a specific

customer’s loan request. The correlation with target values is a measure of how much each feature is

linearly related to the target variable for the entire dataset, not for a single instance3. The data

summary page in AutoML Tables shows the correlation with target values for each feature, as well as

other statistics such as mean, standard deviation, and histogram4. However, these statistics are not

useful for explaining the model’s decision for a specific customer, as they do not account for the

interactions between features or the non-linearity of the model.

Option C is incorrect because using the feature importance percentages in the model evaluation

page is not a good way to provide the reasons that contributed to the model’s decision for a specific

customer’s loan request. The feature importance percentages are a measure of how much each

feature affects the overall accuracy of the model for the entire dataset, not for a single instance5. The

model evaluation page in AutoML Tables shows the feature importance percentages for each feature,

as well as other metrics such as precision, recall, and confusion matrix. However, these metrics are

not useful for explaining the model’s decision for a specific customer, as they do not reflect the

individual contribution of each feature for a given prediction.

Option D is incorrect because varying features independently to identify the threshold per feature

that changes the classification is not a feasible way to provide the reasons that contributed to the

model’s decision for a specific customer’s loan request. This method involves changing the value of

one feature at a time, while keeping the other features constant, and observing how the prediction

changes. However, this method is not practical, as it requires making multiple prediction requests,

and may not capture the interactions between features or the non-linearity of the model.

Reference:

Local feature importance

Getting local feature importance values

Correlation with target values

Data summary page

Feature importance percentages

[Model evaluation page]

[Varying features independently]

Q: 20

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?

Options

Q: 21

You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?

Options

Correct Answer:

Explanation

The best option for reducing the sensitivity of the dataset before training the model is to use the

Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to

encrypt sensitive values with Format Preserving Encryption. This option allows you to keep every

column in the dataset, while protecting the sensitive data from unauthorized access or exposure. The

Cloud DLP API can detect and classify various types of sensitive data, such as names, email

addresses, phone numbers, credit card numbers, and more1. Dataflow can create scalable and

reliable pipelines to process large volumes of data from BigQuery and other sources2. Format

Preserving Encryption (FPE) is a technique that encrypts sensitive data while preserving its original

format and length, which can help maintain the utility and validity of the data3. By using Dataflow

with the DLP API, you can apply FPE to the sensitive values in the dataset, and store the encrypted

data in BigQuery or another destination. You can also use the same pipeline to decrypt the data

when needed, by using the same encryption key and method4.

The other options are not as suitable as option B, for the following reasons:

Option A: Using Dataflow to ingest the columns with sensitive data from BigQuery, and then

randomize the values in each sensitive column, would reduce the sensitivity of the data, but also the

utility and accuracy of the data. Randomization is a technique that replaces sensitive data with

random values, which can prevent re-identification of the data, but also distort the distribution and

relationships of the data3. This can affect the performance and quality of the ML model, especially if

every column is critical to the model.

Option C: Using the Cloud DLP API to scan for sensitive data, and use Dataflow to replace all sensitive

data by using the encryption algorithm AES-256 with a salt, would reduce the sensitivity of the data,

but also the utility and validity of the data. AES-256 is a symmetric encryption algorithm that uses a

256-bit key to encrypt and decrypt data. A salt is a random value that is added to the data before

encryption, to increase the randomness and security of the encrypted data. However, AES-256 does

not preserve the format or length of the original data, which can cause problems when storing or

processing the data. For example, if the original data is a 10-digit phone number, AES-256 would

produce a much longer and different string, which can break the schema or logic of the dataset3.

Option D: Before training, using BigQuery to select only the columns that do not contain sensitive

data, and creating an authorized view of the data so that sensitive values cannot be accessed by

unauthorized individuals, would reduce the exposure of the sensitive data, but also the

completeness and relevance of the data. An authorized view is a BigQuery view that allows you to

share query results with particular users or groups, without giving them access to the underlying

tables. However, this option assumes that you can identify the columns that do not contain sensitive

data, which may not be easy or accurate. Moreover, this option would remove some columns from

the dataset, which can affect the performance and quality of the ML model, especially if every

column is critical to the model.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI,

Week 2: Privacy

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 5: Developing

responsible AI solutions, 5.2 Implementing privacy techniques

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 9:

Responsible AI, Section 9.4: Privacy

De-identification techniques

Cloud Data Loss Prevention (DLP) API

Dataflow

Using Dataflow and Sensitive Data Protection to securely tokenize and import data from a relational

database to BigQuery

[AES encryption]

[Salt (cryptography)]

[Authorized views]

Q: 22

You are building a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

Options

Q: 23

You recently trained a XGBoost model that you plan to deploy to production for online inference Before sending a predict request to your model's binary you need to perform a simple data preprocessing step This step exposes a REST API that accepts requests in your internal VPC Service Controls and returns predictions You want to configure this preprocessing step while minimizing cost and effort What should you do?

Options

Q: 24

You have built a model that is trained on data stored in Parquet files. You access the data through a Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?

Options

Correct Answer:

Explanation

The best option for parametrizing the model training in Kubeflow Pipelines is to add a ContainerOp

to the pipeline that spins a Dataproc cluster, runs a transformation, and then saves the transformed

data in Cloud Storage. This option has the following advantages:

It allows the data transformation to be performed as part of the Kubeflow Pipeline, which can ensure

the consistency and reproducibility of the data processing and the model training. By adding a

ContainerOp to the pipeline, you can define the parameters and the logic of the data transformation

step, and integrate it with the other steps of the pipeline, such as the model training and evaluation.

It leverages the scalability and performance of Dataproc, which is a fully managed service that runs

Apache Spark and Apache Hadoop clusters on Google Cloud. By spinning a Dataproc cluster, you can

run the PySpark transformation on the Parquet files stored in the Hive table, and take advantage of

the parallelism and speed of Spark. Dataproc also supports various features and integrations, such as

autoscaling, preemptible VMs, and connectors to other Google Cloud services, that can optimize the

data processing and reduce the cost.

It simplifies the data storage and access, as the transformed data is saved in Cloud Storage, which is a

scalable, durable, and secure object storage service. By saving the transformed data in Cloud

Storage, you can avoid the overhead and complexity of managing the data in the Hive table or the

Parquet files. Moreover, you can easily access the transformed data from Cloud Storage, using

various tools and frameworks, such as TensorFlow, BigQuery, or Vertex AI.

The other options are less optimal for the following reasons:

Option A: Removing the data transformation step from the pipeline eliminates the parametrization

of the model training, as the data processing and the model training are decoupled and independent.

This option requires running the PySpark transformation separately from the Kubeflow Pipeline,

which can introduce inconsistency and unreproducibility in the data processing and the model

training. Moreover, this option requires managing the data in the Hive table or the Parquet files,

which can be cumbersome and inefficient.

Option B: Containerizing the PySpark transformation step, and adding it to the pipeline introduces

additional complexity and overhead. This option requires creating and maintaining a Docker image

that can run the PySpark transformation, which can be challenging and time-consuming. Moreover,

this option requires running the PySpark transformation on a single container, which can be slow and

inefficient, as it does not leverage the parallelism and performance of Spark.

Option D: Deploying Apache Spark at a separate node pool in a Google Kubernetes Engine cluster,

and adding a ContainerOp to the pipeline that invokes a corresponding transformation job for this

Spark instance introduces additional complexity and cost. This option requires creating and managing

a separate node pool in a Google Kubernetes Engine cluster, which is a fully managed service that

runs Kubernetes clusters on Google Cloud. Moreover, this option requires deploying and running

Apache Spark on the node pool, which can be tedious and costly, as it requires configuring and

maintaining the Spark cluster, and paying for the node pool usage.

Q: 25

You trained a model on data stored in a Cloud Storage bucket. The model needs to be retrained frequently in Vertex AI Training using the latest data in the bucket. Data preprocessing is required prior to retraining. You want to build a simple and efficient near-real-time ML pipeline in Vertex AI that will preprocess the data when new data arrives in the bucket. What should you do?

Options

Q: 26

You are developing a model to help your company create more targeted online advertising campaigns. You need to create a dataset that you will use to train the model. You want to avoid creating or reinforcing unfair bias in the model. What should you do? Choose 2 answers

Options

Q: 27

You built a deep learning-based image classification model by using on-premises dat a. You want to use Vertex Al to deploy the model to production Due to security concerns you cannot move your data to the cloud. You are aware that the input data distribution might change over time You need to detect model performance changes in production. What should you do?

Options

Q: 28

You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?

Options

Correct Answer:

Explanation

Option A is correct because importing the TensorFlow model with BigQuery ML, and running the

ml.predict function is the easiest way to execute a batch prediction on a large BigQuery table with a

custom TensorFlow model, and store the predicted results in another BigQuery table. BigQuery ML

allows you to import TensorFlow models that are stored in Cloud Storage, and use them for

prediction with SQL queries1. The ml.predict function returns a table with the predicted values,

which can be saved to another BigQuery table2.

Option B is incorrect because using the TensorFlow BigQuery reader to load the data, and using the

BigQuery API to write the results to BigQuery requires more effort to build the inference pipeline

than option A. The TensorFlow BigQuery reader is a way to read data from BigQuery into TensorFlow

datasets, which can be used for training or prediction3. However, this option also requires writing

code to load the TensorFlow model, run the prediction, and use the BigQuery API to write the results

back to BigQuery4.

Option C is incorrect because creating a Dataflow pipeline to convert the data in BigQuery to

TFRecords, running a batch inference on Vertex AI Prediction, and writing the results to BigQuery

requires more effort to build the inference pipeline than option A. Dataflow is a service for creating

and running data processing pipelines, such as ETL (extract, transform, load) or batch processing5.

Vertex AI Prediction is a service for deploying and serving ML models for online or batch prediction.

However, this option also requires writing code to create the Dataflow pipeline, convert the data to

TFRecords, run the batch inference, and write the results to BigQuery.

Option D is incorrect because loading the TensorFlow SavedModel in a Dataflow pipeline, using the

BigQuery I/O connector with a custom function to perform the inference within the pipeline, and

writing the results to BigQuery requires more effort to build the inference pipeline than option A.

The BigQuery I/O connector is a way to read and write data from BigQuery within a Dataflow

pipeline. However, this option also requires writing code to load the TensorFlow SavedModel, create

the custom function for inference, and write the results to BigQuery.

Reference:

Importing models into BigQuery ML

Using imported models for prediction

TensorFlow BigQuery reader

BigQuery API

Dataflow overview

[Vertex AI Prediction overview]

[Batch prediction with Dataflow]

[BigQuery I/O connector]

[Using TensorFlow models in Dataflow]

Q: 29

You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail. Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes, given the average of each sensor’s data from the past 12 hours. How should you design the architecture?

Options

Q: 30

You need to analyze user activity data from your company’s mobile applications. Your team will use BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to ensure real-time ingestion of the user activity data into BigQuery. What should you do?

Options

Correct Answer:

Explanation

The best option to ensure real-time ingestion of the user activity data into BigQuery is to run a

Dataflow streaming job to ingest the data into BigQuery. Dataflow is a fully managed service that can

handle both batch and stream processing of data, and can integrate seamlessly with BigQuery and

other Google Cloud services. Dataflow can also use Apache Beam as the programming model, which

provides a unified and portable API for developing data pipelines. By using Dataflow, you can avoid

the complexity and overhead of managing your own infrastructure, and focus on the logic and

transformation of your data. Dataflow can also handle various types of data, such as structured,

unstructured, or binary data, and can apply windowing, aggregation, and other operations on the

data streams.

The other options are not optimal for the following reasons:

A . Configuring Pub/Sub to stream the data into BigQuery is not a good option, as Pub/Sub is a

messaging service that can publish and subscribe to data streams, but cannot perform any

transformation or processing on the data. Pub/Sub can be used as a source or a sink for Dataflow, but

not as a standalone solution for ingesting data into BigQuery.

B . Running an Apache Spark streaming job on Dataproc to ingest the data into BigQuery is not a

good option, as it requires setting up and managing your own cluster of virtual machines, which can

increase the cost and complexity of your solution. Moreover, Apache Spark is not natively integrated

with BigQuery, and requires using connectors or intermediate storage to write data to BigQuery,

which can introduce latency and inefficiency.

D . Configuring Pub/Sub and a Dataflow streaming job to ingest the data into BigQuery is not a bad

option, but it is not necessary, as Dataflow can directly read data from the mobile applications

without using Pub/Sub as an intermediary. Using Pub/Sub can add an extra layer of abstraction and

reliability, but it can also increase the cost and complexity of your solution, and introduce some delay

in the data ingestion.

Reference:

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

Google Cloud launches machine learning engineer certification

Dataflow documentation

BigQuery documentation

Q: 31

You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest most efficient approach. What should you do?

Options

Correct Answer:

Explanation

The simplest and most efficient approach for preparing the data for AutoML is to use BigQuery and

Vertex AI. BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast

and interactive queries on large datasets. BigQuery can preprocess the data by using SQL functions

such as filtering, aggregating, joining, transforming, and creating new features. The preprocessed

data can be stored in a new table in BigQuery, which can be used as the data source for Vertex AI.

Vertex AI is a unified platform for building and deploying machine learning solutions on Google

Cloud. Vertex AI can create a managed dataset from a BigQuery table, which can be used to train an

AutoML model. Vertex AI can also evaluate, deploy, and monitor the AutoML model, and provide

online or batch predictions. By using BigQuery and Vertex AI, users can leverage the power and

simplicity of Google Cloud to train an AutoML model to predict house prices.

The other options are not as simple or efficient as option A, for the following reasons:

Option B: Using Dataflow to preprocess the data and write the output in TFRecord format to a Cloud

Storage bucket would require more steps and resources than using BigQuery and Vertex AI. Dataflow

is a service that can create scalable and reliable pipelines to process large volumes of data from

various sources. Dataflow can preprocess the data by using Apache Beam, a programming model for

defining and executing data processing workflows. TFRecord is a binary file format that can store

sequential data efficiently. However, using Dataflow and TFRecord would require writing code,

setting up a pipeline, choosing a runner, and managing the output files. Moreover, TFRecord is not a

supported format for Vertex AI managed datasets, so the data would need to be converted to CSV or

JSONL files before creating a Vertex AI managed dataset.

Option C: Writing a query that preprocesses the data by using BigQuery and exporting the query

results as CSV files would require more steps and storage than using BigQuery and Vertex AI. CSV is a

text file format that can store tabular data in a comma-separated format. Exporting the query results

as CSV files would require choosing a destination Cloud Storage bucket, specifying a file name or a

wildcard, and setting the export options. Moreover, CSV files can have limitations such as size,

schema, and encoding, which can affect the quality and validity of the data. Exporting the data as

CSV files would also incur additional storage costs and reduce the performance of the queries.

Option D: Using a Vertex AI Workbench notebook instance to preprocess the data by using the

pandas library and exporting the data as CSV files would require more steps and skills than using

BigQuery and Vertex AI. Vertex AI Workbench is a service that provides an integrated development

environment for data science and machine learning. Vertex AI Workbench allows users to create and

run Jupyter notebooks on Google Cloud, and access various tools and libraries for data analysis and

machine learning. Pandas is a popular Python library that can manipulate and analyze data in a

tabular format. However, using Vertex AI Workbench and pandas would require creating a notebook

instance, writing Python code, installing and importing pandas, connecting to BigQuery, loading and

preprocessing the data, and exporting the data as CSV files. Moreover, pandas can have limitations

such as memory usage, scalability, and compatibility, which can affect the efficiency and reliability of

the data processing.

Reference:

Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: Data Engineering for

ML on Google Cloud, Week 1: Introduction to Data Engineering for ML

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Architecting low-code

ML solutions, 1.3 Training models by using AutoML

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: Low-

code ML Solutions, Section 4.3: AutoML

BigQuery

Vertex AI

Dataflow

TFRecord

CSV

Vertex AI Workbench

Pandas

Q: 32

You need to develop a custom TensorRow model that will be used for online predictions. The training data is stored in BigQuery. You need to apply instance-level data transformations to the data for model training and serving. You want to use the same preprocessing routine during model training and serving. How should you configure the preprocessing routine?

Options

Q: 33

You are creating a social media app where pet owners can post images of their pets. You have one million user uploaded images with hashtags. You want to build a comprehensive system that recommends images to users that are similar in appearance to their own uploaded images. What should you do?

Options

Correct Answer:

Explanation

The best option to build a comprehensive system that recommends images to users that are similar

in appearance to their own uploaded images is to download a pretrained convolutional neural

network (CNN), and use the model to generate embeddings of the input images. Embeddings are

low-dimensional representations of high-dimensional data that capture the essential features and

semantics of the data. By using a pretrained CNN, you can leverage the knowledge learned from

large-scale image datasets, such as ImageNet, and apply it to your own domain. A pretrained CNN

can be used as a feature extractor, where the output of the last hidden layer (or any intermediate

layer) is taken as the embedding vector for the input image. You can then measure the similarity

between embeddings using a distance metric, such as cosine similarity or Euclidean distance, and

recommend images that have the highest similarity scores to the user’s uploaded image. Option A is

incorrect because downloading a pretrained CNN and fine-tuning the model to predict hashtags

based on the input images may not capture the visual similarity of the images, as hashtags may not

reflect the appearance of the images accurately. For example, two images of different breeds of dogs

may have the same hashtag #dog, but they may not look similar to each other. Moreover, fine-tuning

the model may require additional data and computational resources, and it may not generalize well

to new images that have different or missing hashtags. Option B is incorrect because retrieving

image labels and dominant colors from the input images using the Vision API may not capture the

visual similarity of the images, as labels and colors may not reflect the fine-grained details of the

images. For example, two images of the same breed of dog may have different labels and colors

depending on the background, lighting, and angle of the image. Moreover, using the Vision API may

incur additional costs and latency, and it may not be able to handle custom or domain-specific labels.

Option C is incorrect because using the provided hashtags to create a collaborative filtering algorithm

may not capture the visual similarity of the images, as collaborative filtering relies on the ratings or

References

of users, not the features of the images. For example, two images of different animals

may have similar ratings or preferences from users, but they may not look similar to each other.

Moreover, collaborative filtering may suffer from the cold start problem, where new images or users

that have no ratings or preferences cannot be recommended. Reference:

Image similarity search with TensorFlow

Image embeddings documentation

Pretrained models documentation

Similarity metrics documentation

Q: 34

You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?

Options

Q: 35

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano. Scikit-team, and custom libraries. What should you do?

Options

Correct Answer:

Explanation

A cloud-based backend system is a system that runs on a cloud platform and provides services or

resources to other applications or users. A cloud-based backend system can be used to submit

training jobs, which are tasks that involve training a machine learning model on a given dataset using

a specific framework and configuration1

However, a cloud-based backend system can also have some drawbacks, such as:

High maintenance: A cloud-based backend system may require a lot of administration and

management, such as provisioning, scaling, monitoring, and troubleshooting the cloud resources and

services. This can be time-consuming and costly, and may distract from the core business objectives2

Low flexibility: A cloud-based backend system may not support all the frameworks and libraries that

the data scientists need to use for their training jobs. This can limit the choices and capabilities of the

data scientists, and affect the quality and performance of their models3

Poor integration: A cloud-based backend system may not integrate well with other cloud services or

tools that the data scientists need to use for their machine learning workflows, such as data

processing, model deployment, or model monitoring. This can create compatibility and

interoperability issues, and reduce the efficiency and productivity of the data scientists.

Therefore, it may be better to use a managed service instead of a cloud-based backend system to

submit training jobs. A managed service is a service that is provided and operated by a third-party

provider, and offers various benefits, such as:

Low maintenance: A managed service handles the administration and management of the cloud

resources and services, and abstracts away the complexity and details of the underlying

infrastructure. This can save time and money, and allow the data scientists to focus on their core

tasks2

High flexibility: A managed service can support multiple frameworks and libraries that the data

scientists need to use for their training jobs, and allow them to customize and configure their training

environments and parameters. This can enhance the choices and capabilities of the data scientists,

and improve the quality and performance of their models3

Easy integration: A managed service can integrate seamlessly with other cloud services or tools that

the data scientists need to use for their machine learning workflows, and provide a unified and

consistent interface and experience. This can solve the compatibility and interoperability issues, and

increase the efficiency and productivity of the data scientists.

One of the best options for using a managed service to submit training jobs is to use the AI Platform

custom containers feature to receive training jobs using any framework. AI Platform is a Google

Cloud service that provides a platform for building, deploying, and managing machine learning

models. AI Platform supports various machine learning frameworks, such as TensorFlow, PyTorch,

scikit-learn, and XGBoost, and provides various features, such as hyperparameter tuning, distributed

training, online prediction, and model monitoring.

The AI Platform custom containers feature allows the data scientists to use any framework or library

that they want for their training jobs, and package their training application and dependencies as a

Docker container image. The data scientists can then submit their training jobs to AI Platform, and

specify the container image and the training parameters. AI Platform will run the training jobs on the

cloud infrastructure, and handle the scaling, logging, and monitoring of the training jobs. The data

scientists can also use the AI Platform features to optimize, deploy, and manage their models.

The other options are not as suitable or feasible. Configuring Kubeflow to run on Google Kubernetes

Engine and receive training jobs through TFJob is not ideal, as Kubeflow is mainly designed for

TensorFlow-based training jobs, and does not support other frameworks or libraries. Creating a

library of VM images on Compute Engine and publishing these images on a centralized repository is

not optimal, as Compute Engine is a low-level service that requires a lot of administration and

management, and does not provide the features and integrations of AI Platform. Setting up Slurm

workload manager to receive jobs that can be scheduled to run on your cloud infrastructure is not

relevant, as Slurm is a tool for managing and scheduling jobs on a cluster of nodes, and does not

provide a managed service for training jobs.

Reference: 1: Cloud computing 2: Managed services 3: Machine learning frameworks : [Machine

learning workflow] : [AI Platform overview] : [Custom containers for training]

Question 1 of 35

Free Professional-Machine-Learning-Engineer Practice Questions – 2026 Updated

Google Machine Learning Engineer

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE