Q: 18
Which component of the NVIDIA AI software stack is primarily responsible for optimizing deep
learning inference performance by leveraging the specific architecture of NVIDIA GPUs?
Options
Discussion
TensorRT (B) is the one built for serious inference optimization on NVIDIA GPUs. It does things like layer fusion and precision tuning to squeeze out max performance, especially using features like Tensor Cores. cuDNN and CUDA are more general-purpose, Triton just serves models, but TensorRT actually rewrites and speeds up the model graph. Pretty sure B is right here. Disagree?
Be respectful. No spam.