Question 3 - AWS MLA-C01 Real Exam Questions [March 2026 Update]

Q: 3

An ML engineer has trained a neural network by using stochastic gradient descent (SGD). The neural network performs poorly on the test set. The values for training loss and validation loss remain high and show an oscillating pattern. The values decrease for a few epochs and then increase for a few epochs before repeating the same cycle. What should the ML engineer do to improve the training process?

Options

Discussion

Daniel D. Feb 23, 2026 4:16 pm

D . The high and oscillating loss is usually a sign that the learning rate is too aggressive, causing SGD to overshoot minima. Trap here is C, but increasing it would make the bounce worse. If someone disagrees let me know.

Aisha Mar 3, 2026 12:45 am

D is the way I'd go. When both training and validation loss are up and down like that, usually means the learning rate’s just too high and SGD keeps overshooting. Lowering it helps things converge more smoothly. Not 100% but this lines up with what I've seen.

Ethan E. Feb 14, 2026 9:01 am

Probably D, the loss jumping up and down usually means learning rate is too high.

Karan G. Feb 16, 2026 12:05 pm

Wish AWS would stop recycling these SGD questions, D

Robin O. Mar 4, 2026 10:01 pm

Option C

Jamie Feb 21, 2026 2:34 am

Nah, I don’t think it’s C since high, bouncing losses point to learning rate being too big. D.

MasonB Feb 19, 2026 3:27 pm

Looks like C could help if you want things to converge faster. Sometimes increasing the learning rate fixes slow training, but here the losses are oscillating. I think I'd still try C before anything else though, since D feels a bit too safe for speed. Disagree?

Leo Feb 22, 2026 1:20 am

That bouncing loss pattern usually means the learning rate's too high, so D is the fix. Lowering it can help SGD settle down. Pretty sure this is what they're looking for but happy if someone disagrees.

Daniel Feb 21, 2026 10:00 am

Oscillating high loss on both train and val sets usually points to a learning rate that's too big, so D makes sense here. Saw similar Qs in exam reports, dropping the rate helps stabilize SGD. Not 100% but this fits the scenario.

Amelia A. Mar 2, 2026 1:10 pm

D imo. Had something like this in a mock before-oscillating loss is classic too-high learning rate. Dropping it helps SGD settle better, less bouncing around. Pretty sure this is the fix, but open if someone had a different result.

Be respectful. No spam.

Correct Answer:

Explanation

The described symptoms—high, oscillating training and validation loss—are characteristic of a learning rate that is too high for the given model and data. In stochastic gradient descent (SGD), a high learning rate causes the optimizer to take excessively large steps, overshooting the minimum of the loss function. This leads to the loss value bouncing around the minimum without converging, creating the observed oscillating pattern. Decreasing the learning rate will reduce the step size, allowing the optimizer to descend more smoothly and stably towards a minimum, thereby improving the training process.

Why Incorrect

A. Introduce early stopping: Early stopping is a regularization technique to prevent overfitting (when validation loss increases while training loss decreases), not to fix a fundamental convergence problem where both losses are high and oscillating.

B. Increase the size of the test set: The test set is used for final model evaluation after training. Its size has no impact on the training dynamics or the convergence of the loss function.

C. Increase the learning rate: The symptoms indicate the learning rate is already too high. Increasing it further would exacerbate the oscillations and likely cause the loss to diverge completely.

References

1. Goodfellow

Bengio

& Courville

A. (2016). Deep Learning. MIT Press.

Section 8.3.1

"Learning Rate": The text explains

"If the learning rate is too large

gradient descent can inadvertently increase the value of the objective function... This happens when a step is so large that it steps over the minimum and lands in a region with a higher cost than the starting point." This directly explains the oscillating and increasing loss pattern.

2. Karpathy

A. (n.d.). CS231n: Convolutional Neural Networks for Visual Recognition. Stanford University.

Module 1: Neural Networks Part 3

"Learning and Evaluation" section: The course notes illustrate how a learning rate that is too high causes the loss to "jump" around or diverge. A visual diagram shows the loss curve oscillating wildly when the learning rate is too high

matching the scenario in the question.

3. Amazon Web Services (AWS). (2023). Tune a Model with the SageMaker API. AWS Documentation.

Section: "Learning rate": The documentation identifies the learning rate as a critical hyperparameter that "controls how much to change the model in response to the estimated error each time the model weights are updated." Unstable training

as described

is a primary reason to adjust this hyperparameter

typically by decreasing it.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE