Question 13

Question

In managing an AI data center, you need to ensure continuous optimal performance and quickly
respond to any potential issues. Which monitoring tool or approach would best suit the need to
monitor GPU health, usage, and performance metrics across all deployed AI workloads?

Accepted Answer

NVIDIA DCGM (Data Center GPU Manager)

Sara I. · Answer

Had something like this in a mock, D is correct for sure. DCGM gives deep GPU insights out of the box which is what most exam scenarios are after. Anyone disagree?

Parker · Answer

I don't think B is right here, even though Node Exporter with Prometheus can be extended for GPU stats. D (NVIDIA DCGM) is purpose-built for GPU health, so it fits the question much better imo. Anyone else see similar on practice exams?

Sara · Answer

Option B. Prometheus with Node Exporter. Some setups use Node Exporter for GPU stats so it's a common trap here.

Parker B. · Answer

Its B, since Prometheus with Node Exporter can collect system metrics and you can add exporters for GPU. Not 100% sure but seen setups use it for monitoring a range of hardware.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE