Google Machine Learning Engineer
Q: 1
You work as an analyst at a large banking firm. You are developing a robust, scalable ML pipeline to
train several regression and classification models. Your primary focus for the pipeline is model
interpretability. You want to productionize the pipeline as quickly as possible What should you do?
Options
Q: 2
You work for a bank You have been asked to develop an ML model that will support loan application
decisions. You need to determine which Vertex Al services to include in the workflow You want to
track the model's training parameters and the metrics per training epoch. You plan to compare the
performance of each version of the model to determine the best model based on your chosen
metrics. Which Vertex Al services should you use?
Options
Q: 3
You developed a custom model by using Vertex Al to forecast the sales of your company s products
based on historical transactional data You anticipate changes in the feature distributions and the
correlations between the features in the near future You also expect to receive a large volume of
prediction requests You plan to use Vertex Al Model Monitoring for drift detection and you want to
minimize the cost. What should you do?
Options
Q: 4
You work for a pet food company that manages an online forum Customers upload photos of their
pets on the forum to share with others About 20 photos are uploaded daily You want to
automatically and in near real time detect whether each uploaded photo has an animal You want to
prioritize time and minimize cost of your application development and deployment What should you
do?
Options
Q: 5
You recently developed a deep learning model using Keras, and now you are experimenting with
different training strategies. First, you trained the model using a single GPU, but the training process
was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy
(with no other changes), but you did not observe a decrease in training time. What should you do?
Options
Q: 6
You need to train a computer vision model that predicts the type of government ID present in a given
image using a GPU-powered virtual machine on Compute Engine. You use the following parameters:
• Optimizer: SGD
• Image shape 224x224
• Batch size 64
• Epochs 10
• Verbose 2
During training you encounter the following error: ResourceExhaustedError: out of Memory (oom)
when allocating tensor. What should you do?
Options
Q: 7
Your data science team has requested a system that supports scheduled model retraining, Docker
containers, and a service that supports autoscaling and monitoring for online prediction requests.
Which platform components should you choose for this system?
Options
Q: 8
Your team has a model deployed to a Vertex Al endpoint You have created a Vertex Al pipeline that
automates the model training process and is triggered by a Cloud Function. You need to prioritize
keeping the model up-to-date, but also minimize retraining costs. How should you configure
retraining'?
Options
Q: 9
You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is
to issue a query against BigQuery. You plan to use the results of that query as the input to the next
step in your pipeline. You want to achieve this in the easiest way possible. What should you do?
Options
Q: 10
You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-
managed notebooks. You use BigQuery to split your data into training and validation sets using the
following queries:
CREATE OR REPLACE TABLE ‘myproject.mydataset.training‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.8);
CREATE OR REPLACE TABLE ‘myproject.mydataset.validation‘ AS
(SELECT * FROM ‘myproject.mydataset.mytable‘ WHERE RAND() < 0.2);
After training the model, you achieve an area under the receiver operating characteristic curve (AUC
ROC) value of 0.8, but after deploying the model to production, you notice that your model
performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?
Options
Q: 11
You work at a subscription-based company. You have trained an ensemble of trees and neural
networks to predict customer churn, which is the likelihood that customers will not renew their
yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the
model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is
located in New York City, and became a customer in 1997. You need to explain the difference
between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex
Explainable AI. What should you do?
Options
Q: 12
You are developing ML models with Al Platform for image segmentation on CT scans. You frequently
update your model architectures based on the newest available research papers, and have to rerun
training on the same dataset to benchmark their performance. You want to minimize computation
costs and manual intervention while having version control for your code. What should you do?
Options
Q: 13
You work for the AI team of an automobile company, and you are developing a visual defect
detection model using TensorFlow and Keras. To improve your model performance, you want to
incorporate some image augmentation functions such as translation, cropping, and contrast
tweaking. You randomly apply these functions to each training batch. You want to optimize your data
processing pipeline for run time and compute resources utilization. What should you do?
Options
Q: 14
You recently deployed a scikit-learn model to a Vertex Al endpoint You are now testing the model on
live production traffic While monitoring the endpoint. you discover twice as many requests per hour
than expected throughout the day You want the endpoint to efficiently scale when the demand
increases in the future to prevent users from experiencing high latency What should you do?
Options
Q: 15
You developed a custom model by using Vertex Al to predict your application's user churn rate You
are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery
contains two sets of features - demographic and behavioral You later discover that two separate
models trained on each set perform better than the original model
You need to configure a new model mentioning pipeline that splits traffic among the two models You
want to use the same prediction-sampling-rate and monitoring-frequency for each model You also
want to minimize management effort What should you do?
Options
Q: 16
You need to develop an image classification model by using a large dataset that contains labeled
images in a Cloud Storage Bucket. What should you do?
Options
Q: 17
You have created a Vertex Al pipeline that includes two steps. The first step preprocesses 10 TB data
completes in about 1 hour, and saves the result in a Cloud Storage bucket The second step uses the
processed data to train a model You need to update the model's code to allow you to test different
algorithms You want to reduce pipeline execution time and cost, while also minimizing pipeline
changes What should you do?
Options
Q: 18
You are training an ML model on a large dataset. You are using a TPU to accelerate the training
process You notice that the training process is taking longer than expected. You discover that the TPU
is not reaching its full capacity. What should you do?
Options
Q: 19
You are an ML engineer at a bank. You have developed a binary classification model using AutoML
Tables to predict whether a customer will make loan payments on time. The output is used to
approve or reject loan requests. One customer’s loan request has been rejected by your model, and
the bank’s risks department is asking you to provide the reasons that contributed to the model’s
decision. What should you do?
Options
Q: 20
You are working on a Neural Network-based project. The dataset provided to you has columns with
different ranges. While preparing the data for model training, you discover that gradient
optimization is having difficulty moving weights to a good solution. What should you do?
Options
Q: 21
You are training an ML model using data stored in BigQuery that contains several values that are
considered Personally Identifiable Information (Pll). You need to reduce the sensitivity of the dataset
before training your model. Every column is critical to your model. How should you proceed?
Options
Q: 22
You are building a linear regression model on BigQuery ML to predict a customer's likelihood of
purchasing your company's products. Your model uses a city name variable as a key predictive
component. In order to train and serve the model, your data must be organized in columns. You want
to prepare your data using the least amount of coding while maintaining the predictable variables.
What should you do?
Options
Q: 23
You recently trained a XGBoost model that you plan to deploy to production for online inference
Before sending a predict request to your model's binary you need to perform a simple data
preprocessing step This step exposes a REST API that accepts requests in your internal VPC Service
Controls and returns predictions You want to configure this preprocessing step while minimizing cost
and effort What should you do?
Options
Q: 24
You have built a model that is trained on data stored in Parquet files. You access the data through a
Hive table hosted on Google Cloud. You preprocessed these data with PySpark and exported it as a
CSV file into Cloud Storage. After preprocessing, you execute additional steps to train and evaluate
your model. You want to parametrize this model training in Kubeflow Pipelines. What should you do?
Options
Q: 25
You trained a model on data stored in a Cloud Storage bucket. The model needs to be retrained
frequently in Vertex AI Training using the latest data in the bucket. Data preprocessing is required
prior to retraining. You want to build a simple and efficient near-real-time ML pipeline in Vertex AI
that will preprocess the data when new data arrives in the bucket. What should you do?
Options
Q: 26
You are developing a model to help your company create more targeted online advertising
campaigns. You need to create a dataset that you will use to train the model. You want to avoid
creating or reinforcing unfair bias in the model. What should you do?
Choose 2 answers
Options
Q: 27
You built a deep learning-based image classification model by using on-premises dat
a. You want to use Vertex Al to deploy the model to production Due to security concerns you cannot
move your data to the cloud. You are aware that the input data distribution might change over time
You need to detect model performance changes in production. What should you do?
Options
Q: 28
You need to execute a batch prediction on 100 million records in a BigQuery table with a custom
TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want
to minimize the effort required to build this inference pipeline. What should you do?
Options
Q: 29
You need to design an architecture that serves asynchronous predictions to determine whether a
particular mission-critical machine part will fail. Your system collects data from multiple sensors from
the machine. You want to build a model that will predict a failure in the next N minutes, given the
average of each sensor’s data from the past 12 hours. How should you design the architecture?
Options
Q: 30
You need to analyze user activity data from your company’s mobile applications. Your team will use
BigQuery for data analysis, transformation, and experimentation with ML algorithms. You need to
ensure real-time ingestion of the user activity data into BigQuery. What should you do?
Options
Q: 31
You want to train an AutoML model to predict house prices by using a small public dataset stored in
BigQuery. You need to prepare the data and want to use the simplest most efficient approach. What
should you do?
Options
Q: 32
You need to develop a custom TensorRow model that will be used for online predictions. The training
data is stored in BigQuery. You need to apply instance-level data transformations to the data for
model training and serving. You want to use the same preprocessing routine during model training
and serving. How should you configure the preprocessing routine?
Options
Q: 33
You are creating a social media app where pet owners can post images of their pets. You have one
million user uploaded images with hashtags. You want to build a comprehensive system that
recommends images to users that are similar in appearance to their own uploaded images.
What should you do?
Options
Q: 34
You work with a team of researchers to develop state-of-the-art algorithms for financial analysis.
Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of
debugging while also reducing the model training time. How should you set up your training
environment?
Options
Q: 35
You manage a team of data scientists who use a cloud-based backend system to submit training jobs.
This system has become very difficult to administer, and you want to use a managed service instead.
The data scientists you work with use many different frameworks, including Keras, PyTorch, theano.
Scikit-team, and custom libraries. What should you do?
Options
Question 1 of 35