Question 14

Question

An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed
training. After some training attempts, the ML engineer observes that the instances are not
performing as expected. The ML engineer identifies communication overhead between the training
instances.
What should the ML engineer do to MINIMIZE the communication overhead between the instances?

Accepted Answer

Place the instances in the same VPC subnet. Store the data in the same AWS Region and
Availability Zone where the instances are deployed.

Layla X. · Answer

C . Keeping training instances and data in the same AZ really cuts network latency for distributed jobs. Official AWS ML guide and practice exams talk about minimizing cross-AZ traffic for this exact reason, so pretty confident here.

Avery F. · Answer

Option C is it. Keeping compute and data in the same AZ (and subnet) really reduces network latency for distributed ML jobs, which exam guides and AWS whitepapers drill on. I remember labs where any cross-AZ setup added delays fast. Pretty sure about this but open to other interpretations if someone's seen different in practice.

PracticalAuditor3101 · Answer

Option C

Daniel Q. · Answer

C. not D

Daniel T. · Answer

C not D. Only C puts both compute and data in the same AZ, so network latency is lowest. Pretty sure that's what matters most for distributed training here. If someone disagrees let me know.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE