Question 5

Question

You recently developed a deep learning model using Keras, and now you are experimenting with
different training strategies. First, you trained the model using a single GPU, but the training process
was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy
(with no other changes), but you did not observe a decrease in training time. What should you do?

Accepted Answer

Increase the batch size.

Parker K. · Answer

Option D is what I'd pick. With multiple GPUs and MirroredStrategy, you usually need to up the batch size so each GPU can get enough data to process in parallel, otherwise training time won't drop. Happened to me before in practice, but open if anyone disagrees.

Jack Q. · Answer

Probably D. If you just switch to MirroredStrategy but keep the same (small) batch size, each GPU isn't used efficiently so no real speed gain. Increasing the batch size lets GPUs process more data in parallel. Not 100% sure if dataset sharding is a trap here, but D is what I see in similar questions.

Maya K. · Answer

B Saw a similar question on a practice set and picked this for more control with MirroredStrategy.

Zoe · Answer

Is the question assuming we want to keep model accuracy exactly the same, or is a minor change in accuracy acceptable if we can finish training faster? That would make D a safer pick.

Quinn T. · Answer

D tbh, since without upping batch size the GPUs just aren't being fully used. Pretty sure that's the main trap here, a lot of folks want to jump to A but that's more about dataset distribution, not actual speedup.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE