Question 11

Question

You are monitoring the resource utilization of a DGX SuperPOD cluster using NVIDIA Base Command
Manager (BCM). The system is experiencing slow performance, and you need to identify the cause.
What is the most effective way to monitor GPU usage across nodes?

Accepted Answer

Use the Base View dashboard to monitor GPU, CPU, and memory utilization in real-time.

Sara · Answer

Had a similar scenario in practice, isn't it B since Base View gives you the whole cluster view?

Luke I. · Answer

Probably B since the dashboard lets you see GPU stats for the whole cluster in real-time. D is a classic approach but way too manual for monitoring a SuperPOD. Base View just gives better visibility here, correct me if I'm missing something.

Owen Y. · Answer

D imo. nvidia-smi is pretty much the go-to for GPU stats, and if you want actual numbers per node, it's reliable. The dashboard might be nice for a big picture but honestly, running nvidia-smi gives you the live GPU utilization directly where the jobs run. Maybe not as centralized, but more granular I think. Let me know if that's off.

Grace P. · Answer

B makes sense, the dashboard gives you real-time cluster stats so you’re not checking every node one by one.

Skyler Y. · Answer

D , nvidia-smi on each node has always shown me exact GPU usage before.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE