Q: 14
An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed
training. After some training attempts, the ML engineer observes that the instances are not
performing as expected. The ML engineer identifies communication overhead between the training
instances.
What should the ML engineer do to MINIMIZE the communication overhead between the instances?
Options
Discussion
C
Likely it's C. For SageMaker distributed training, having all instances and data in the same AZ cuts down latency and sync overhead a lot compared to spreading across AZs or Regions. Official docs and exam guide both mention network proximity for performance. If anyone's used actual labs, I'd expect similar behavior there too, but open if someone saw different results.
C not D
I remember a similar scenario from labs. in some practice sets, I'd go with D here.
Be respectful. No spam.