Q: 11
You are monitoring the resource utilization of a DGX SuperPOD cluster using NVIDIA Base Command
Manager (BCM). The system is experiencing slow performance, and you need to identify the cause.
What is the most effective way to monitor GPU usage across nodes?
Options
Discussion
Had a similar scenario in practice, isn't it B since Base View gives you the whole cluster view?
Probably B since the dashboard lets you see GPU stats for the whole cluster in real-time. D is a classic approach but way too manual for monitoring a SuperPOD. Base View just gives better visibility here, correct me if I'm missing something.
D imo. nvidia-smi is pretty much the go-to for GPU stats, and if you want actual numbers per node, it's reliable. The dashboard might be nice for a big picture but honestly, running nvidia-smi gives you the live GPU utilization directly where the jobs run. Maybe not as centralized, but more granular I think. Let me know if that's off.
B makes sense, the dashboard gives you real-time cluster stats so you’re not checking every node one by one.
D , nvidia-smi on each node has always shown me exact GPU usage before.
Has anyone here actually used the Base View dashboard in BCM for a live cluster? Wondering if it shows real-time GPU stats for all nodes at once or if you still need to check per node sometimes. Practice exams usually point toward dashboards but curious about your real-world experience with this tool.
B for sure, since Base View dashboard gives that real-time cluster-wide GPU usage. D is too hands-on for a big setup.
Honestly I would've picked D at first since nvidia-smi is the classic GPU check for each node. But for cluster-wide real-time visibility, that's too manual. So tempting trap here with D.
B
Nah, it's B here-Base View dashboard does all nodes real-time. D is just per-node, easy to miss that.
Be respectful. No spam.