View MLS-C01 Exam Questions

Q: 1

[Modeling] An agricultural company is interested in using machine learning to detect specific types of weeds in a 100-acre grassland field. Currently, the company uses tractor-mounted cameras to capture multiple images of the field as 10 × 10 grids. The company also has a large training dataset that consists of annotated images of popular weed classes like broadleaf and non-broadleaf docks. The company wants to build a weed detection model that will detect specific types of weeds and the location of each type within the field. Once the model is ready, it will be hosted on Amazon SageMaker endpoints. The model will perform real-time inferencing using the images captured by the cameras. Which approach should a Machine Learning Specialist take to obtain accurate predictions?

Options

Discussion

Aaron Y. Feb 20, 2026 9:59 pm

Nina O. Mar 5, 2026 2:22 pm

Yeah, gotta go with C. SageMaker SSD requires RecordIO for object detection, and image classification (A or D) won't return weed locations, which is needed here. Parquet format (B, D) is a distractor. Pretty sure about this but let me know if I'm off.

Olivia M. Mar 2, 2026 10:53 pm

A is wrong, C. RecordIO is required for SageMaker SSD, and classification alone won’t give you object locations.

BenH Feb 24, 2026 3:52 pm

Why wouldn't B work here? Isn't Parquet mostly for tabular data, and SSD on SageMaker expects RecordIO?

Luna Mar 4, 2026 9:33 pm

Its B. Went for Parquet with SSD because I thought Apache Parquet might be supported for object detection tasks too.

Drew M. Feb 16, 2026 4:25 am

C imo. For object detection with SageMaker SSD, RecordIO is the supported input, not Parquet, and you need bounding boxes for location data. Saw a similar question in practice sets, matches what AWS wants. Disagree?

Karan X. Feb 15, 2026 2:15 am

SageMaker docs are all over the place about formats, but yeah, C here.

Ryan Feb 14, 2026 8:47 am

Pretty sure it's C for this kind of weed location problem. Object detection like SSD can pinpoint where each weed is in the image, not just the type. RecordIO format is also what SageMaker object detection expects. Not 100% but makes sense with how SageMaker pipelines work.

Be respectful. No spam.

Q: 2

[Data Engineering] A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, server-based application in each store to send the data to AWS over the internet. The company uses this data to train a machine learning model that is retrained each day. The company's data science team has identified existing attributes on these records that could be combined to create an improved model. Which change will create the required transformed records with the LEAST operational overhead?

Options

Discussion

Piya Feb 28, 2026 9:59 pm

Going with A here too. Using Lambda for transformation within Firehose avoids managing infrastructure, and AWS handles the scaling. The other choices involve running clusters or EC2, which is more to maintain. Pretty sure this is the least ops work, correct me if I missed something!

Liam Feb 28, 2026 4:41 pm

A . Using Lambda for transformation with Firehose means no server management, auto-scaling, and it's built right into the delivery stream. The other options add way more operational work. If everything fits in a Lambda, this is the easiest path. Tell me if I'm missing any curveballs here.

Riley M. Feb 23, 2026 6:08 pm

C/D? Both need more ops than A but can't tell if option D does anything the question wants.

Quinn L. Feb 16, 2026 8:45 am

Its A

Anita Feb 14, 2026 6:05 am

Its A. EMR in B is tempting but it’s overkill for just transforming Firehose records, way more management needed compared to Lambda.

Ajay P. Mar 5, 2026 11:42 am

Probably A, B is a trap since EMR clusters need way more management and scheduling effort.

Ethan R. Feb 26, 2026 7:02 am

Option A. Less overhead since you can plug Lambda straight into Firehose for transformation, no need to manage clusters or EC2. Not 100% but pretty sure that’s what AWS recommends for this setup.

Morgan N. Mar 1, 2026 8:35 am

B tbh, but only if Lambda can't handle the transformation size or complexity. Otherwise A.

Ishaan Feb 20, 2026 9:01 pm

I don’t think it’s B. A is the only one with built-in transformation, less ops work.

Layla U. Feb 23, 2026 10:33 am

Be respectful. No spam.

Q: 3

[Data Engineering] A large JSON dataset for a project has been uploaded to a private Amazon S3 bucket The Machine Learning Specialist wants to securely access and explore the data from an Amazon SageMaker notebook instance A new VPC was created and assigned to the Specialist How can the privacy and integrity of the data stored in Amazon S3 be maintained while granting access to the Specialist for analysis?

Options

Discussion

Ivy N. Feb 26, 2026 10:00 am

C . ACLs in A are risky and presigned URLs in D don’t meet the privacy/integrity part. B is a trap since copying local doesn’t control S3 access. Saw a similar question in some practice-C is the secure setup with VPC endpoint plus bucket policy. Correct me if I missed something on endpoint restrictions.

Ishaan Feb 26, 2026 4:28 am

C. had something like this in a mock and C was the answer. VPC endpoint plus custom bucket policy limits access securely.

Rowan E. Feb 15, 2026 12:01 pm

My vote is C. Saw a similar question in my practice set, VPC endpoint with a custom bucket policy is the AWS way to lock down access. The other options are less secure for privacy, let me know if anyone else thinks otherwise.

Owen E. Feb 26, 2026 2:28 am

Don't think D is right. C locks down access with a VPC endpoint and bucket policy, so privacy/integrity are actually maintained. D's presigned URLs are more for temporary or external access, which is a trap here if you're thinking about security requirements.

Priya D. Feb 16, 2026 6:51 pm

A is wrong, C. Using an S3 VPC endpoint plus restricting bucket access to the VPC keeps your data private and protected. ACLs to everyone or presigned URLs aren’t secure enough here, pretty sure this matches AWS best practices.

Jamie S. Feb 15, 2026 7:38 am

These AWS exam questions always overcomplicate. B tbh, copying the dataset to local SageMaker volume after using the VPC endpoint sounds simpler.

Grace Feb 17, 2026 8:32 am

I don't think opening up S3 access with ACLs or pre-signed URLs (A, D) really fits the security requirement. Also, B copies the data locally which doesn't actually secure S3 access itself. C's combo of VPC endpoint plus strict bucket policy is best for privacy and integrity. Pretty sure that's what AWS recommends. Open to other takes if anyone thinks differently about endpoint scope though.

Emma J. Feb 17, 2026 10:55 pm

Doesn't C fail if the VPC endpoint isn't limited to just the S3 bucket? Could allow more access than needed. C

Luna J. Mar 3, 2026 3:52 pm

I don’t think B covers privacy fully. C is better because the custom S3 bucket policy restricts access to only your VPC, so no public or broad access like with ACLs or presigned URLs. Pretty sure this is the most secure way per AWS docs.

Skyler Feb 21, 2026 10:25 pm

C vs B? Saw a similar question in an exam report and the correct pick was C, makes sense since VPC endpoint plus bucket policy keeps data private-no need to copy locally. Anyone else see the same?

Be respectful. No spam.

Q: 4

[Data Engineering] A medical imaging company wants to train a computer vision model to detect areas of concern on patients' CT scans. The company has a large collection of unlabeled CT scans that are linked to each patient and stored in an Amazon S3 bucket. The scans must be accessible to authorized users only. A machine learning engineer needs to build a labeling pipeline. Which set of steps should the engineer take to build the labeling pipeline with the LEAST effort?

Options

Discussion

Nathan Feb 27, 2026 6:17 pm

C . Private workforce in SageMaker Ground Truth covers "authorized users only" since you control access, and using the built-in bounding box task means no custom labeling app needed. That keeps effort low compared to building your own tools or using Mechanical Turk (which isn't private enough for this kind of medical data). Unless there's some hidden requirement I missed, C fits best here.

Jack G. Mar 1, 2026 10:49 am

C . You get private access control with SageMaker Ground Truth, and the bounding box task is built-in so no custom tool needed. Seems quickest for sensitive healthcare data, though open to other takes if I missed something.

Luna M. Feb 19, 2026 8:19 pm

C tbh, had something like this in a mock exam and C was correct for private, low-effort setup.

Sanjay Feb 15, 2026 12:28 pm

I think this is same as a common exam questions, in practice, pretty sure the answer is C

Ishaan D. Feb 16, 2026 2:49 am

C imo. Only private workforce in SageMaker Ground Truth lines up with "authorized users only" and least setup effort.

MasonQ Feb 20, 2026 10:41 pm

I remember a similar scenario from labs and C was always the go-to. Setting up a private workforce in SageMaker Ground Truth with the bounding box task saves tons of setup compared to custom solutions, and it handles access for authorized users. Pretty sure that's what AWS expects here, but happy to hear if anyone has a real-world counterexample?

RaviL Mar 3, 2026 6:23 pm

D imo

QuickSec2920 Mar 6, 2026 8:11 am

I’d say C here. Using a private workforce in SageMaker Ground Truth matches the need for restricting access to just authorized users, and the built-in bounding box task saves a ton of setup. Rest of the options are more work or don't properly handle PHI. Anyone disagree?

Quinn E. Feb 15, 2026 1:40 pm

B , Mechanical Turk plus built-in Ground Truth tasks is pretty fast to set up. Saw similar in some practice guides. Only thing is the private data, but for least effort B looks close. Anyone see a rule against it in official docs?

QuinnI Feb 19, 2026 10:41 pm

Be respectful. No spam.

Q: 5

[Data Engineering] A machine learning specialist is preparing data for training on Amazon SageMaker. The specialist is using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is transformed into a numpy.array, which appears to be negatively affecting the speed of the training. What should the specialist do to optimize the data for training on SageMaker?

Options

Discussion

Luna L. Feb 21, 2026 3:01 pm

Option C, Had something like this in a mock, and SageMaker built-in algorithms are definitely optimized for RecordIO protobuf format. Using numpy arrays can slow down training since the built-ins expect RecordIO for efficiency. Not 100% but pretty sure C is the way to go here, agree?

Piya C. Mar 2, 2026 11:24 pm

Its C here. The built-in SageMaker algorithms are optimized for RecordIO protobuf, not numpy arrays or even Parquet, which is mostly about storage efficiency. D feels like a trap-hyperparameter optimization only tunes model params, not the raw data format. Pretty sure C, but if I missed something let me know.

Rowan L. Mar 5, 2026 3:23 am

Could honestly see a case for B but C matches best for boosting SageMaker training speed. Not totally sure, anyone pick B?

Logan B. Mar 4, 2026 8:58 pm

D . RecordIO protobuf is the efficient choice here, but D could trip folks up since hyperparameter optimization doesn't touch data format at all. So C for speed.

Ava Y. Feb 27, 2026 9:46 pm

Ben V. Feb 25, 2026 5:33 pm

RecordIO protobuf (C) makes sense since it's the format SageMaker built-ins ingest most efficiently. Numpy arrays aren't optimized for distributed training there. Pretty sure that's what AWS recommends for speed, but open if anyone saw otherwise in practice.

Priya S. Feb 24, 2026 2:28 pm

Probably C here. RecordIO protobuf is what SageMaker wants for built-in algos, so that's gonna speed things up vs numpy arrays.

Nathan E. Mar 1, 2026 3:12 am

Its C. RecordIO protobuf is specifically used by SageMaker built-in algorithms for better speed and parallelism. DataFrames or batch transform won’t improve training performance for built-ins. I’m pretty sure about this but open to feedback if anyone heard different.

Mia S. Feb 27, 2026 12:59 pm

Transforming to RecordIO format is what SageMaker built-ins work best with, so C.

Be respectful. No spam.

Q: 6

[Data Engineering] A company wants to predict stock market price trends. The company stores stock market data each business day in Amazon S3 in Apache Parquet format. The company stores 20 GB of data each day for each stock code. A data engineer must use Apache Spark to perform batch preprocessing data transformations quickly so the company can complete prediction jobs before the stock market opens the next day. The company plans to track more stock market codes and needs a way to scale the preprocessing data transformations. Which AWS service or feature will meet these requirements with the LEAST development effort over time?

Options

Discussion

Sara J. Mar 5, 2026 4:30 pm

EmmaY Feb 21, 2026 3:06 am

A tbh

Skyler R. Feb 23, 2026 3:46 pm

I don’t think EMR (B) is right for "least dev effort". Glue (A) offers managed Spark, scales easily, and there’s basically no infrastructure to maintain. EMR’s more hands-on and can be overkill unless deep Spark tuning is needed. Athena and Lambda just don’t fit here. Open to being corrected if anyone has counter examples from recent AWS docs.

Leo Y. Feb 24, 2026 3:32 pm

Actually it's A here. EMR (B) is a trap because it takes more ongoing setup and management, while Glue handles Spark with way less dev effort as they want. Athena and Lambda just can't do this scale for Spark batch jobs. If anyone's seen a newer AWS recommendation that changes this, let me know.

Chris I. Feb 26, 2026 3:30 am

Probably B here since EMR handles Spark natively and you get a lot of flexibility for scaling big batch jobs. Glue is less manual but I think EMR's better for heavy Spark workflows. Not totally sure though, open to feedback.

SharpTester8105 Feb 22, 2026 3:37 pm

I've seen similar practice questions, and the official AWS guide covers Glue for this scenario. Going with B.

Quinn X. Feb 28, 2026 5:58 am

Ugh, AWS loves to push Glue for all ETL. I was thinking B at first since EMR gives you more direct Spark control, but the setup's a pain if you want to scale out later or just manage tons of jobs. Wouldn't Athena be easier here?

Be respectful. No spam.

Q: 7

[Modeling] A beauty supply store wants to understand some characteristics of visitors to the store. The store has security video recordings from the past several years. The store wants to generate a report of hourly visitors from the recordings. The report should group visitors by hair style and hair color. Which solution will meet these requirements with the LEAST amount of effort?

Options

Discussion

Sara W. Mar 4, 2026 11:16 am

C or A. If we strictly care about the least effort, semantic segmentation (C) should work best for isolating hair regions, then ResNet-50 is purpose-built for classes like hairstyle and color. But object detection (A) could get close if the hair region is obvious in most frames. I think C has the edge, but in a dataset with super clean frames, maybe A isn't as much extra work. Anyone disagree?

Leo Feb 23, 2026 12:16 pm

For me, C, semantic segmentation plus ResNet-50 lines up with least effort for visual grouping like this.

Liam Mar 1, 2026 7:48 am

C , XGBoost is a trap here since it's not really for image stuff. Seen similar logic in practice questions too.

Meera V. Feb 26, 2026 4:51 am

C imo

Riley R. Feb 27, 2026 2:21 pm

Nah, XGBoost is a common trap for image stuff. C here.

Ravi Feb 15, 2026 7:58 am

C imo, had something like this in a mock and ResNet-50 after semantic segmentation fit best for image tasks.

Piya J. Feb 19, 2026 4:21 pm

B , since XGBoost is great for classification. I realize it's usually for tabular data so maybe I'm missing something here, but it seems like with the right features from object detection this could work? Not 100% sure.

Layla Feb 18, 2026 5:10 pm

C, matches what I’ve seen in exam reports and AWS official guide samples for image-based tasks.

Quinn D. Mar 5, 2026 10:04 am

Likely C, since segmentation handles pixel-wise hair extraction for vision tasks and ResNet-50 works better for image classifications. XGBoost is more tabular.

Luke I. Feb 19, 2026 11:00 am

C tbh, you can double check in the official guide or AWS practice exam for similar flows.

Be respectful. No spam.

Q: 8

[Data Engineering] A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs The workflow consists of the following processes * Start the workflow as soon as data is uploaded to Amazon S3 * When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3 * Store the results of joining datasets in Amazon S3 * If one of the jobs fails, send a notification to the Administrator Which configuration will meet these requirements?

Options

Discussion

Taylor F. Mar 2, 2026 8:40 am

Option A looks right to me. Step Functions can coordinate the ETL workflow and wait for uploads, then Glue is ideal for joining large datasets in S3. CloudWatch with SNS covers notifications. Pretty sure this matches the scenario, unless I’m missing something.

Jamie V. Feb 23, 2026 7:14 pm

Makes sense to pick A here. Step Functions can coordinate all dataset arrival events before kicking off Glue, which is ideal for large data joins, plus CloudWatch + SNS covers the failure notification part. Pretty sure that's the setup they want, unless I'm missing something.

Anita Y. Feb 25, 2026 5:39 pm

C/D? I'm not convinced D can handle multi-terabyte joins (Lambda limits), but C's use of AWS Batch doesn't really fit the trigger/flow requirements here either. I'd probably still say A is best, but curious if there's a real-world case where C could work. Anyone see C actually being used like this?

Hannah R. Feb 27, 2026 11:26 am

C/D? But I don't think D can handle big datasets well, Lambda chaining hits limits with multi-TB data. C doesn't really match the workflow triggers. Pretty sure A's Step Functions with Glue is the design AWS pushes for this kind of use case. Disagree?

QuinnI Mar 1, 2026 11:42 am

Why do you think Lambda chaining (D) works with multi-terabyte S3 joins? Isn't Glue better for that size?

Adam Feb 18, 2026 11:47 pm

Probably A for this one

Meera Feb 27, 2026 11:33 pm

D . Had something like this in a mock, setup was Lambda chain for the ETL parts.

Be respectful. No spam.

Q: 9

[Modeling] While working on a neural network project, a Machine Learning Specialist discovers thai some features in the data have very high magnitude resulting in this data being weighted more in the cost function What should the Specialist do to ensure better convergence during backpropagation?

Options

Discussion

Vikram K. Feb 23, 2026 1:36 pm

Option B makes sense here. When features have very different scales, normalization is key since it evens out their influence on the cost function and helps gradients flow better during backprop. Regularization (C) helps with overfitting but doesn't fix scaling problems directly. I'm confident about B, but open if someone has another take.

Luna Feb 14, 2026 11:48 am

I don't think it's A. Dimensionality reduction drops features but doesn't solve the scaling issue. I usually see B (data normalization) in practice for this, since it keeps big features from overpowering the others during backprop. C is tempting but more about overfitting. Anyone see it different?

Aisha R. Feb 23, 2026 5:27 am

B . High magnitude features skew the cost function in neural nets, so normalization is pretty much the go-to move for better convergence during backprop. A is kind of a trap since dimensionality reduction doesn't really fix magnitude issues. Let me know if you see it differently.

CarefulMentor337 Feb 19, 2026 2:30 pm

Option A. had something like this in a mock and picked it for feature magnitude.

Chloe X. Mar 3, 2026 6:17 pm

C's not right here. B is what fixes feature scaling issues for neural nets, so that's the one I'd pick.

Ivy I. Mar 3, 2026 5:28 pm

I don't think it's A. Features with huge magnitude mess up convergence unless you normalize, so B fits better here.

DirectMentor8352 Feb 24, 2026 11:10 pm

Its B here, but here's a gotcha: if you were using something like tree-based models (say, XGBoost), feature scale actually wouldn't impact convergence much. Since this is about neural nets and backprop, normalization's necessary. Wouldn't pick B for every ML algorithm-depends on optimizer too.

Alex Feb 15, 2026 9:23 am

B tbh for sure, but small nitpick: if the model used scale-invariant algorithms, normalization wouldn't strictly be required for convergence. Here though, normalizing is key unless you know the model handles scaling internally.

Jack Mar 2, 2026 2:57 am

Probably B. Normalization fixes the scale so no feature dominates, unlike option A which just reduces dimensions but not the magnitude issue.

Parker O. Feb 20, 2026 8:00 pm

B fits here. Normalizing the data puts all features on a similar scale so no single feature dominates the loss function. This improves convergence during training. Pretty sure it's not A or D.

Be respectful. No spam.

Q: 10

[Data Engineering] A company has raw user and transaction data stored in AmazonS3 a MySQL database, and Amazon RedShift A Data Scientist needs to perform an analysis by joining the three datasets from Amazon S3, MySQL, and Amazon RedShift, and then calculating the average-of a few selected columns from the joined data Which AWS service should the Data Scientist use?

Options

Discussion

Luna Y. Feb 20, 2026 12:01 am

A . Athena supports federated queries with connectors for S3, MySQL, and Redshift, so it can join across all three sources in a single query. B is tempting but Spectrum can’t directly query MySQL. Correct me if I missed something though!

Nina J. Mar 4, 2026 11:44 pm

Option A is what I'd pick here. Pretty sure Athena can query directly from S3, and it can also use federated queries to pull data from MySQL and Redshift. Not 100% because I get confused with Glue sometimes, but Athena seems best for actually joining across these sources and running SQL-type analysis. Anyone else agree?

Sam H. Mar 2, 2026 6:27 pm

I don’t think it’s A. B. Spectrum feels like the right tool since it works directly with S3 and Redshift, plus I’ve seen folks trip up on Athena vs Spectrum for these kinds of integrations.

Neha Mar 6, 2026 4:48 am

Taylor P. Feb 24, 2026 6:32 am

A tbh

Ishaan I. Feb 26, 2026 7:49 pm

Probably A. Saw a similar question on an earlier practice, Athena can join across S3, MySQL, and Redshift for analysis.

Anita X. Feb 14, 2026 8:45 pm

A , had something like this in a mock. Athena supports federated queries across S3, MySQL, and Redshift, so you can join them directly for analysis. Not fully sure if Glue could do it too but Athena lines up with what I've seen.

Luna P. Feb 22, 2026 3:27 pm

B or C? I know Athena (A) does federated queries, but Redshift Spectrum (B) can also bridge S3 and Redshift. Not sure if MySQL counts as a trap here, but B keeps coming up in practice sets.

Ben P. Mar 2, 2026 11:49 pm

C. AWS Glue could work if you wanted to join sources and transform data before analysis, right? It's got crawlers and jobs for ETL, so kinda fits when pulling from S3, MySQL, and Redshift. Not totally sure though-maybe that's overkill for just analysis.

Chris Q. Feb 28, 2026 2:21 am

Be respectful. No spam.

Question 1 of 20 · Page 1 / 2

Premium Access Includes

✓ Quiz Simulator
✓ Exam Mode
✓ Progress Tracking
✓ Question Saving
✓ Flash Cards
✓ Drag & Drops
✓ 3 Months Access
✓ PDF Downloads

Get Premium Access

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE