Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

NVIDIA NCP-AIO - NVIDIA AI Operations

Page: 2 / 2
Total 66 questions

You need to do maintenance on a node. What should you do first?

A.

Drain the compute node using scontrol update.

B.

Set the node state to down in Slurm before completing maintenance.

C.

Set the node state to down in Slurm before completing maintenance.

D.

Disable job scheduling on all compute nodes in Slurm before completing maintenance.

A system administrator needs to lower latency for an AI application by utilizing GPUDirect Storage.

What two (2) bottlenecks are avoided with this approach? (Choose two.)

A.

PCIe

B.

CPU

C.

NIC

D.

System Memory

E.

DPU

You are configuring cloudbursting for your on-premises cluster using BCM, and you plan to extend the cluster into both AWS and Azure.

What is a key requirement for enabling cloudbursting across multiple cloud providers?

A.

You only need to configure credentials for one cloud provider, as BCM will automatically replicate them across other providers.

B.

You need to set up a single set of credentials that works across both AWS and Azure for seamless integration.

C.

You must configure separate credentials for each cloud provider in BCM to enable their use in the cluster extension process.

D.

BCM automatically detects and configures credentials for all supported cloud providers without requiring admin input.

A system administrator needs to scale a Kubernetes Job to 4 replicas.

What command should be used?

A.

kubectl stretch job --replicas=4

B.

kubectl autoscale deployment job --min=1 --max=10

C.

kubectl scale job --replicas=4

D.

kubectl scale job -r 4

A data scientist is training a deep learning model and notices slower than expected training times. The data scientist alerts a system administrator to inspect the issue. The system administrator suspects the disk IO is the issue.

What command should be used?

A.

tcpdump

B.

iostat

C.

nvidia-smi

D.

htop

A system administrator needs to configure and manage multiple installations of NVIDIA hardware ranging from single DGX BasePOD to SuperPOD.

Which software stack should be used?

A.

NetQ

B.

Fleet Command

C.

Magnum IO

D.

Base Command Manager

You are managing a high availability (HA) cluster that hosts mission-critical applications. One of the nodes in the cluster has failed, but the application remains available to users.

What mechanism is responsible for ensuring that the workload continues to run without interruption?

A.

Load balancing across all nodes in the cluster.

B.

Manual intervention by the system administrator to restart services.

C.

The failover mechanism that automatically transfers workloads to a standby node.

D.

Data replication between nodes to ensure data integrity.

You are an administrator managing a large-scale Kubernetes-based GPU cluster using Run:AI.

To automate repetitive administrative tasks and efficiently manage resources across multiple nodes, which of the following is essential when using the Run:AI Administrator CLI for environments where automation or scripting is required?

A.

Use the runai-adm command to directly update Kubernetes nodes without requiring kubectl.

B.

Use the CLI to manually allocate specific GPUs to individual jobs for better resource management.

C.

Ensure that the Kubernetes configuration file is set up with cluster administrative rights before using the CLI.

D.

Install the CLI on Windows machines to take advantage of its scripting capabilities.

Your organization is deploying an AI workload that requires high-throughput access to shared storage across multiple servers. The workload involves both training and inference tasks that need fast read and write speeds.

Which storage architecture would best support this AI workload?

A.

Use local storage on each server to minimize network traffic between nodes.

B.

Prioritize write performance over read performance since training tasks dominate AI workflows.

C.

A high-performance shared storage system that supports both high read and write IO performance.

D.

Use SSD-based shared storage systems to save costs while scaling up storage capacity.