Question 16 - Google Professional-Machine-Learning-Engineer Real Exam Questions [March 2026 Update]

Q: 16

You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage Bucket. What should you do?

Options

Correct Answer:

Explanation

The best option for developing an image classification model by using a large dataset that contains

labeled images in a Cloud Storage bucket is to import the labeled images as a managed dataset in

Vertex AI and use AutoML to train the model. This option allows you to leverage the power and

simplicity of Google Cloud to create and deploy a high-quality image classification model with

minimal code and configuration. Vertex AI is a unified platform for building and deploying machine

learning solutions on Google Cloud. Vertex AI can create a managed dataset from a Cloud Storage

bucket that contains labeled images, which can be used to train an AutoML model. AutoML is a

service that can automatically build and optimize machine learning models for various tasks, such as

image classification, object detection, natural language processing, and tabular data analysis.

AutoML can handle the complex aspects of machine learning, such as feature engineering, model

architecture, hyperparameter tuning, and model evaluation. AutoML can also evaluate, deploy, and

monitor the image classification model, and provide online or batch predictions. By using Vertex AI

and AutoML, users can develop an image classification model by using a large dataset with ease and

efficiency.

The other options are not as good as option C, for the following reasons:

Option A: Using Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. Vertex AI Pipelines is a service that can orchestrate machine learning

workflows using Vertex AI. Vertex AI Pipelines can run preprocessing and training steps on custom

Docker images, and evaluate, deploy, and monitor the machine learning model. Kubeflow Pipelines

SDK is a Python library that can create and run pipelines on Vertex AI Pipelines or on Kubeflow, an

open-source platform for machine learning on Kubernetes. However, using Vertex AI Pipelines and

Kubeflow Pipelines SDK would require writing code, building Docker images, defining pipeline

components and steps, and managing the pipeline execution and artifacts. Moreover, Vertex AI

Pipelines and Kubeflow Pipelines SDK are not specialized for image classification, and users would

need to use other libraries or frameworks, such as TensorFlow or PyTorch, to build and train the

image classification model.

Option B: Using Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads

the images from Cloud Storage and trains the model would require more skills and steps than using

Vertex AI and AutoML. TensorFlow Extended (TFX) is a framework that can create and run end-to-end

machine learning pipelines on TensorFlow, a popular library for building and training deep learning

models. TFX can preprocess the data, train and evaluate the model, validate and push the model,

and serve the model for online or batch predictions. However, using Vertex AI Pipelines and TFX

would require writing code, building Docker images, defining pipeline components and steps, and

managing the pipeline execution and artifacts. Moreover, TFX is not optimized for image

classification, and users would need to use other libraries or tools, such as TensorFlow Data

Validation, TensorFlow Transform, and TensorFlow Hub, to handle the image data and the model

architecture.

Option D: Converting the image dataset to a tabular format using Dataflow, loading the data into

BigQuery, and using BigQuery ML to train the model would not handle the image data properly and

could result in a poor model performance. Dataflow is a service that can create scalable and reliable

pipelines to process large volumes of data from various sources. Dataflow can preprocess the data by

using Apache Beam, a programming model for defining and executing data processing workflows.

BigQuery is a serverless, scalable, and cost-effective data warehouse that can perform fast and

interactive queries on large datasets. BigQuery ML is a service that can create and train machine

learning models by using SQL queries on BigQuery. However, converting the image data to a tabular

format would lose the spatial and semantic information of the images, which are essential for image

classification. Moreover, BigQuery ML is not specialized for image classification, and users would

need to use other tools or techniques, such as feature hashing, embedding, or one-hot encoding, to

handle the categorical features.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE