Q: 4
You are deploying a large-scale AI model training pipeline on a cloud-based infrastructure that uses
NVIDIA GPUs. During the training, you observe that the system occasionally crashes due to memory
overflows on the GPUs, even though the overall GPU memory usage is below the maximum capacity.
What is the most likely cause of the memory overflows, and what should youdo to mitigate this
issue?
Options
Discussion
Option D
A is wrong, D. Fragmented memory can block allocations even when total GPU usage seems fine, so unified memory management (D) would help here. Batch size (A) is a trap since usage didn’t exceed capacity. I think D but open to other views if someone has seen different behavior in practice.
Its D here since fragmented memory can cause allocation failures even if you haven't hit max total usage. Enabling unified memory gives the system more flexibility to manage those gaps. Not 100% but makes most sense for this scenario, agree?
D fits best. Fragmented GPU memory can block big allocs even if total usage looks OK, and unified memory management helps smooth that out. I’m pretty sure that’s what they want but happy to hear other takes.
D , saw similar in a practice exam. Fragmented memory is classic for overflows below max usage.
Probably D. Fragmented GPU memory explains why you get overflows even without maxing out usage. Enabling unified memory helps the GPU make better use of what's available. Makes sense here, but if anyone got a C/B scenario working, let me know.
A is wrong, D. Fragmented memory would cause this even if usage isn't maxed. Similar scenario popped up in a practice set.
D, not A. Batch size matters for total capacity but if overall usage is below max, it's probably a memory fragmentation issue like D describes. Unified memory helps with that sort of problem. I think D is right but open to other takes.
Had something like this in a mock, went with A.
Yeah this is D. Fragmented GPU memory can cause this kind of overflow below max usage.
Be respectful. No spam.