Q: 5
You recently developed a deep learning model using Keras, and now you are experimenting with
different training strategies. First, you trained the model using a single GPU, but the training process
was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy
(with no other changes), but you did not observe a decrease in training time. What should you do?
Options
Discussion
Option D is what I'd pick. With multiple GPUs and MirroredStrategy, you usually need to up the batch size so each GPU can get enough data to process in parallel, otherwise training time won't drop. Happened to me before in practice, but open if anyone disagrees.
Probably D. If you just switch to MirroredStrategy but keep the same (small) batch size, each GPU isn't used efficiently so no real speed gain. Increasing the batch size lets GPUs process more data in parallel. Not 100% sure if dataset sharding is a trap here, but D is what I see in similar questions.
B Saw a similar question on a practice set and picked this for more control with MirroredStrategy.
Is the question assuming we want to keep model accuracy exactly the same, or is a minor change in accuracy acceptable if we can finish training faster? That would make D a safer pick.
D tbh, since without upping batch size the GPUs just aren't being fully used. Pretty sure that's the main trap here, a lot of folks want to jump to A but that's more about dataset distribution, not actual speedup.
Maybe D since if you don’t raise batch size, splitting work over more GPUs usually won’t add up to less time per epoch. There’s a caveat: if the model or dataset is tiny, comms overhead can actually dominate and scaling still doesn’t help. Seen that trip people up in practice. Agree?
Nah, it's gotta be D. If you don't bump up the batch size, GPUs just won't get utilized in parallel-I've seen this catch folks out before. C sounds tempting but the real bottleneck here is how much data each GPU gets per step. Disagree?
I don't think it's C. TPUs can be really fast but the question already mentions using GPUs and MirroredStrategy. Usually, if you don’t boost batch size, multi-GPU won’t help much anyway. Anyone see actual speedup with A?
C vs D, this is so annoying since TPUs are always hyped for speed. I'd pick C because using TPUs with TPUStrategy typically gives a big jump in training speed compared to just using multiple GPUs. If anyone got better results with batch size tweaks let me know.
Guessing D here, A tempts people but doesn't address GPU utilization properly in this scenario.
Be respectful. No spam.