Q: 10
[AI Network Architecture]
In an AI cluster using NVIDIA GPUs, which configuration parameter in the NicClusterPolicy custom
resource is crucial for enabling high-speed GPU-to-GPU communication across nodes?
Options
Discussion
I'm not totally sure but I think C. OFED Driver. From what I remember, you need OFED for network acceleration with RDMA, so it sounds like that's the key thing to enable fast GPU comms across nodes? Please let me know if I'm missing something here.
C or A. OFED (C) is definitely needed at the system level, but for NicClusterPolicy in k8s, A is probably the config you have to set. Not 100 percent, so open to correction.
Its A, but I was also thinking C at first since OFED is required for RDMA in general. From what I hear RDMA Shared Device Plugin is the one you actually set in NicClusterPolicy. Not 100 percent though.
I don’t think it’s C, pretty sure A is needed in NicClusterPolicy since OFED alone won’t expose RDMA devices.
Be respectful. No spam.