The best option for using Vertex AI Model Monitoring for drift detection and minimizing the cost is to
use the features and the feature attributions for monitoring, and set a prediction-sampling-rate value
that is closer to 0 than 1. This option allows you to leverage the power and flexibility of Google Cloud
to detect feature drift in the input predict requests for custom models, and reduce the storage and
computation costs of the model monitoring job. Vertex AI Model Monitoring is a service that can
track and compare the results of multiple machine learning runs. Vertex AI Model Monitoring can
monitor the model’s prediction input data for feature skew and drift. Feature drift occurs when the
feature data distribution in production changes over time. If the original training data is not available,
you can enable drift detection to monitor your models for feature drift. Vertex AI Model Monitoring
uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each
feature, and compares them with a baseline distribution. The baseline distribution is the statistical
distribution of the feature’s values in the training data. If the training data is not available, the
baseline distribution is calculated from the first 1000 prediction requests that the model receives. If
the distance score for a feature exceeds an alerting threshold that you set, Vertex AI Model
Monitoring sends you an email alert. However, if you use a custom model, you can also enable
feature attribution monitoring, which can provide more insights into the feature drift. Feature
attribution monitoring analyzes the feature attributions, which are the contributions of each feature
to the prediction output. Feature attribution monitoring can help you identify the features that have
the most impact on the model performance, and the features that have the most significant drift
over time. Feature attribution monitoring can also help you understand the relationship between the
features and the prediction output, and the correlation between the features1. The prediction-
sampling-rate is a parameter that determines the percentage of prediction requests that are logged
and analyzed by the model monitoring job. Using a lower prediction-sampling-rate can reduce the
storage and computation costs of the model monitoring job, but also the quality and validity of the
data. Using a lower prediction-sampling-rate can introduce sampling bias and noise into the data,
and make the model monitoring job miss some important features or patterns of the data. However,
using a higher prediction-sampling-rate can increase the storage and computation costs of the model
monitoring job, and also the amount of data that needs to be processed and analyzed. Therefore,
there is a trade-off between the prediction-sampling-rate and the cost and accuracy of the model
monitoring job, and the optimal prediction-sampling-rate depends on the business objective and the
data characteristics2. By using the features and the feature attributions for monitoring, and setting a
prediction-sampling-rate value that is closer to 0 than 1, you can use Vertex AI Model Monitoring for
drift detection and minimize the cost.
The other options are not as good as option D, for the following reasons:
Option A: Using the features for monitoring and setting a monitoring-frequency value that is higher
than the default would not enable feature attribution monitoring, and could increase the cost of the
model monitoring job. The monitoring-frequency is a parameter that determines how often the
model monitoring job analyzes the logged prediction requests and calculates the distributions and
distance scores for each feature. Using a higher monitoring-frequency can increase the frequency
and timeliness of the model monitoring job, but also the computation costs of the model monitoring
job. Moreover, using the features for monitoring would not enable feature attribution monitoring,
which can provide more insights into the feature drift and the model performance1.
Option B: Using the features for monitoring and setting a prediction-sampling-rate value that is
closer to 1 than 0 would not enable feature attribution monitoring, and could increase the cost of the
model monitoring job. The prediction-sampling-rate is a parameter that determines the percentage
of prediction requests that are logged and analyzed by the model monitoring job. Using a higher
prediction-sampling-rate can increase the quality and validity of the data, but also the storage and
computation costs of the model monitoring job. Moreover, using the features for monitoring would
not enable feature attribution monitoring, which can provide more insights into the feature drift and
the model performance12.
Option C: Using the features and the feature attributions for monitoring and setting a monitoring-
frequency value that is lower than the default would enable feature attribution monitoring, but
could reduce the frequency and timeliness of the model monitoring job. The monitoring-frequency is
a parameter that determines how often the model monitoring job analyzes the logged prediction
requests and calculates the distributions and distance scores for each feature. Using a lower
monitoring-frequency can reduce the computation costs of the model monitoring job, but also the
frequency and timeliness of the model monitoring job. This can make the model monitoring job less
responsive and effective in detecting and alerting the feature drift1.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 4: Evaluation
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.3 Monitoring ML models in production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.3: Monitoring ML Models
Using Model Monitoring
Understanding the score threshold slider