Q: 15
You have noticed that users can access all GPUs on a node even when they request only one GPU in
their job script using --gres=gpu:1. This is causing resource contention and inefficient GPU usage.
What configuration change would you make to restrict users’ access to only their allocated GPUs?
Options
Discussion
Option B Seen this on other clusters, you have to set ConstrainDevices=yes in cgroup.conf or SLURM won't restrict GPU access. The other options don't deal with device isolation directly.
Totally agree, B. Setting ConstrainDevices=yes is how you stop jobs from hogging all GPUs on the node.
Pretty sure it's B here. Only ConstrainDevices=yes in cgroup.conf will actually limit the GPUs jobs see, D is a common trap since adding CPUs doesn't isolate devices. If someone saw it work differently, let me know.
B , enabling ConstrainDevices in cgroup.conf is the fix for GPU isolation. The other options won’t stop users from grabbing more GPUs than assigned. Pretty sure that’s the config needed if using Slurm.
Be respectful. No spam.