Q: 10
[AI Network Architecture]
In an AI cluster using NVIDIA GPUs, which configuration parameter in the NicClusterPolicy custom
resource is crucial for enabling high-speed GPU-to-GPU communication across nodes?
Options
Discussion
I don’t think it’s C. A is the NicClusterPolicy param for RDMA comms between GPUs, secondary network is a trap here.
Looks like it's A, since that's the NicClusterPolicy setting for enabling RDMA and fast GPU interconnects. No explanation needed here.
Maybe A. Official doc and lab scenarios mention the RDMA Shared Device Plugin when setting up fast GPU-to-GPU communication. Best to double-check in the NVIDIA admin guide if unsure, since cluster policy params can trip you up. Anyone seen a different config in exam practice?
Anyone checked the official practice tests? Saw a similar cluster policy config one and B was selected in the rationale.
A is the one you actually set in the NicClusterPolicy to expose RDMA capability, which lets Kubernetes pods use high-speed GPU-to-GPU comms across nodes. The OFED driver (C) is required on the host, but that's not a config param in this policy. Pretty sure A's right for this context, unless I'm missing some niche use case.
I'm not totally sure but I think C. OFED Driver. From what I remember, you need OFED for network acceleration with RDMA, so it sounds like that's the key thing to enable fast GPU comms across nodes? Please let me know if I'm missing something here.
C , had something like this in a mock and picked OFED Driver.
C
Saw this setup in a practice exam, official doc and hands-on labs point to the RDMA plugin as key here.
A tbh, since the RDMA Shared Device Plugin is what actually exposes the RDMA interfaces to K8s and enables GPUDirect RDMA. You need that set in NicClusterPolicy for high-speed GPU comms across nodes. Pretty sure that's what they're looking for here.
Be respectful. No spam.