Q: 15
You have noticed that users can access all GPUs on a node even when they request only one GPU in
their job script using --gres=gpu:1. This is causing resource contention and inefficient GPU usage.
What configuration change would you make to restrict users’ access to only their allocated GPUs?
Options
Discussion
Option B Seen this on other clusters, you have to set ConstrainDevices=yes in cgroup.conf or SLURM won't restrict GPU access. The other options don't deal with device isolation directly.
Check the official guide or a lab environment, both show B and cgroup.conf for scenarios like this.
D . Modifying the job script to ask for CPUs and GPUs might help resource allocation, but unless the system actually enforces device access, users can still see all GPUs. Pretty sure that's a trap, but it seems logical at first.
D . If you specify both GPUs and CPUs in the script, you're telling the scheduler to allocate more resources per job so maybe there's better overall isolation. Not totally certain, but I think that helps reduce resource contention. Anyone disagree?
B, saw a similar thing in the official guide and lab environments.
Maybe D, since adding more resource requests in the script can help isolation, but B is the real trap here.
Not D, pretty sure it has to be B. Only B (ConstrainDevices in cgroup.conf) actually restricts GPU device access at the OS level-changing the job script in D doesn't stop jobs from seeing all GPUs. D's a common distractor here.
Its B here, since enabling ConstrainDevices in cgroup.conf is exactly how you tell Slurm to restrict a user's job to only the GPUs it was allocated. None of the others actually enforce device access. I remember this coming up in official training material too, but if there’s a different method folks have used, let me know.
Totally agree, B. Setting ConstrainDevices=yes is how you stop jobs from hogging all GPUs on the node.
B , only ConstrainDevices in cgroup.conf actually locks jobs to just their assigned GPUs. The other options don't enforce that kind of device isolation at all. Seen this fix used in actual Slurm configs before, but let me know if anyone's tried something different.
Be respectful. No spam.