Q: 6
Case study
An ML engineer is developing a fraud detection model on AWS. The training dataset includes
transaction logs, customer profiles, and tables from an on-premises MySQL database. The
transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally,
many of the features have interdependencies. The algorithm is not capturing all the desired
underlying patterns in the data.
Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced
data.
Which solution will meet this requirement with the LEAST operational effort?
Options
Discussion
Option D Data Wrangler automates balancing so it's the lowest effort compared to the others for this case.
D . Data Wrangler makes oversampling super easy for class imbalance, barely any manual setup compared to others.
Pretty sure it's D, encountered exactly similar question in my exam. SageMaker Data Wrangler's balance data feature does this with almost no manual steps compared to the others.
Be respectful. No spam.