NVIDIA NCP AIO
Q: 1
You are managing a high availability (HA) cluster that hosts mission-critical applications. One of the
nodes in the cluster has failed, but the application remains available to users.
What mechanism is responsible for ensuring that the workload continues to run without
interruption?
Options
Q: 2
You are tasked with deploying a deep learning framework container from NVIDIA NGC on a stand-
alone GPU-enabled server.
What must you complete before pulling the container? (Choose two.)
Options
Q: 3
A data scientist is training a deep learning model and notices slower than expected training times.
The data scientist alerts a system administrator to inspect the issue. The system administrator
suspects the disk IO is the issue.
What command should be used?
Options
Q: 4
A system administrator wants to run these two commands in Base Command Manager.
main
showprofile device status apc01
What command should the system administrator use from the management node system shell?
Options
Q: 5
You are managing a Kubernetes cluster running AI training jobs using TensorFlow. The jobs require
access to multiple GPUs across different nodes, but inter-node communication seems slow,
impacting performance.
What is a potential networking configuration you would implement to optimize inter-node
communication for distributed training?
Options
Q: 6
You are managing an on-premises cluster using NVIDIA Base Command Manager (BCM) and need to
extend your computational resources into AWS when your local infrastructure reaches peak capacity.
What is the most effective way to configure cloudbursting in this scenario?
Options
Q: 7
You are managing a Slurm cluster with multiple GPU nodes, each equipped with different types of
GPUs. Some jobs are being allocated GPUs that should be reserved for other purposes, such as
display rendering.
How would you ensure that only the intended GPUs are allocated to jobs?
Options
Q: 8
An organization has multiple containers and wants to view STDIN, STDOUT, and STDERR I/O streams
of a specific container.
What command should be used?
Options
Q: 9
You are an administrator managing a large-scale Kubernetes-based GPU cluster using Run:AI.
To automate repetitive administrative tasks and efficiently manage resources across multiple nodes,
which of the following is essential when using the Run:AI Administrator CLI for environments where
automation or scripting is required?
Options
Q: 10
A system administrator of a high-performance computing (HPC) cluster that uses an InfiniBand fabric
for high-speed interconnects between nodes received reports from researchers that they are
experiencing unusually slow data transfer rates between two specific compute nodes. The system
administrator needs to ensure the path between these two nodes is optimal.
What command should be used?
Options
Question 1 of 10