Amazon MLS-C01
Q: 1
[Modeling]
An agricultural company is interested in using machine learning to detect specific types of weeds in a
100-acre grassland field. Currently, the company uses tractor-mounted cameras to capture multiple
images of the field as 10 × 10 grids. The company also has a large training dataset that consists of
annotated images of popular weed classes like broadleaf and non-broadleaf docks.
The company wants to build a weed detection model that will detect specific types of weeds and the
location of each type within the field. Once the model is ready, it will be hosted on Amazon
SageMaker endpoints. The model will perform real-time inferencing using the images captured by
the cameras.
Which approach should a Machine Learning Specialist take to obtain accurate predictions?
Options
Q: 2
[Data Engineering]
A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by
using Amazon Kinesis Data Firehose. The company uses a small, server-based application in each
store to send the data to AWS over the internet. The company uses this data to train a machine
learning model that is retrained each day. The company's data science team has identified existing
attributes on these records that could be combined to create an improved model.
Which change will create the required transformed records with the LEAST operational overhead?
Options
Q: 3
[Data Engineering]
A large JSON dataset for a project has been uploaded to a private Amazon S3 bucket The Machine
Learning Specialist wants to securely access and explore the data from an Amazon SageMaker
notebook instance A new VPC was created and assigned to the Specialist
How can the privacy and integrity of the data stored in Amazon S3 be maintained while granting
access to the Specialist for analysis?
Options
Q: 4
[Data Engineering]
A medical imaging company wants to train a computer vision model to detect areas of concern on
patients' CT scans. The company has a large collection of unlabeled CT scans that are linked to each
patient and stored in an Amazon S3 bucket. The scans must be accessible to authorized users only. A
machine learning engineer needs to build a labeling pipeline.
Which set of steps should the engineer take to build the labeling pipeline with the LEAST effort?
Options
Q: 5
[Data Engineering]
A machine learning specialist is preparing data for training on Amazon SageMaker. The specialist is
using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format
and is transformed into a numpy.array, which appears to be negatively affecting the speed of the
training.
What should the specialist do to optimize the data for training on SageMaker?
Options
Q: 6
[Data Engineering]
A company wants to predict stock market price trends. The company stores stock market data each
business day in Amazon S3 in Apache Parquet format. The company stores 20 GB of data each day for
each stock code.
A data engineer must use Apache Spark to perform batch preprocessing data transformations quickly
so the company can complete prediction jobs before the stock market opens the next day. The
company plans to track more stock market codes and needs a way to scale the preprocessing data
transformations.
Which AWS service or feature will meet these requirements with the LEAST development effort over
time?
Options
Q: 7
[Modeling]
A beauty supply store wants to understand some characteristics of visitors to the store. The store has
security video recordings from the past several years. The store wants to generate a report of hourly
visitors from the recordings. The report should group visitors by hair style and hair color.
Which solution will meet these requirements with the LEAST amount of effort?
Options
Q: 8
[Data Engineering]
A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs The
workflow consists of the following processes
* Start the workflow as soon as data is uploaded to Amazon S3
* When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets
with multiple terabyte-sized datasets already stored in Amazon S3
* Store the results of joining datasets in Amazon S3
* If one of the jobs fails, send a notification to the Administrator
Which configuration will meet these requirements?
Options
Q: 9
[Modeling]
While working on a neural network project, a Machine Learning Specialist discovers thai some
features in the data have very high magnitude resulting in this data being weighted more in the cost
function What should the Specialist do to ensure better convergence during backpropagation?
Options
Q: 10
[Data Engineering]
A company has raw user and transaction data stored in AmazonS3 a MySQL database, and Amazon
RedShift A Data Scientist needs to perform an analysis by joining the three datasets from Amazon S3,
MySQL, and Amazon RedShift, and then calculating the average-of a few selected columns from the
joined data
Which AWS service should the Data Scientist use?
Options
Question 1 of 10