Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

NVIDIA NCP-AII - NVIDIA AI Infrastructure

Page: 2 / 4
Total 123 questions

A system administrator needs to install a GPU/DPU in a server. The server has a free PCI-e slot, there are enough free PCI-e lanes, and there is enough room for the card. Which procedure should be followed?

A.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Do not wear an ESD bracelet.

B.

Ensure the server has enough power. Make sure the server is down to remove cables safely. Wear an ESD bracelet.

C.

Ensure the server has enough power. Make sure the server is up and running with attached cables. Wear an ESD bracelet.

D.

Ensure the server has enough power. Verify compatibility of cables with server ' s platform. Make sure the server is down to remove cables safely. Wear an ESD bracelet.

If two ports must be connected, but one is SFP and one is QSFP, for example, to connect a 25 GbE HOST CHANNEL ADAPTER to a QSFP port capable of both 100 GbE and 25 GbE, which of the following solutions would best meet this requirement?

A.

SFP Connectors

B.

SFP to 1G BASE-T (RJ45) adapter

C.

QSA Adapter

An engineer needs to completely remove NVIDIA GPU drivers from an Ubuntu 22.04 system to troubleshoot conflicts. Which command sequence ensures all driver components are purged?

A.

sudo ubuntu-drivers uninstall

B.

sudo rm -rf /usr/lib/nvidia

C.

sudo apt-get remove nvidia-driver-550

D.

sudo apt-get purge nvidia-* & & sudo apt-get autoremove

A user encounters " permission denied " errors when running GPU-accelerated containers on a Secure Boot-enabled system. What resolves this?

A.

Enroll the MOK and sign NVIDIA kernel modules.

B.

Reinstall Docker without the NVIDIA runtime.

C.

Disable SELinux to relax unnecessary security policies.

D.

Run Docker with sudo for elevated privileges.

You are installing the operating system as part of the initial setup for a new NVIDIA Base Command Manager cluster. Which two of the following actions are essential for a successful OS installation on the cluster’s head node?

Pick the 2 correct responses below.

A.

Download the latest BCM ISO and verify its integrity using the provided checksum, then start the installation.

B.

Configure network switches for PXE boot to all compute nodes before installing the OS on the head node.

C.

Set the desired time zone and configure NTP synchronization during the OS installation wizard.

D.

Start the head node OS installation process with the system BIOS set to legacy boot mode instead of UEFI.

You are responsible for ensuring interoperability between AI applications deployed across a diverse IT landscape, including an on-premises data center equipped with NVIDIA GPUs and multiple cloud platforms from different vendors. These environments need to support complex AI workflows that involve large-scale data processing, real-time analytics, and machine learning model training. To maintain consistent performance and flexibility, which strategy should you prioritize?

A.

Choose one vendor and standardize on one storage solution across all environments to simplify management and improve interoperability.

B.

Implement a multi-cloud strategy that uses only native storage solutions in each cloud platform while relying on middleware to ensure interoperability and data consistency.

C.

Ensure that all environments use compatible storage protocols and APIs, such as NFS or S3, to facilitate data exchange and integration across platforms.

D.

Focus only on increasing network bandwidth between locations to reduce latency and improve data transfer speeds.

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

A.

The network card has no link / connection.

B.

A boot disk has failed.

C.

Multiple GPUs have failed.

D.

There are more than two failed power supplies.

When updating the firmware on an NVLink switch transceiver, how can an engineer apply new firmware without interrupting the network?

A.

mlxfwreset -d -lid 27 reset --yes to reset the transceiver

B.

Physically disconnect and reconnect the transceiver.

C.

flint -d -lid 27 --linkx --linkx_auto_update --activate

D.

nv action reboot system to force immediate activation.

After updating BlueField-3 DPU BMC firmware via Redfish, the engineer observes “TaskState: Running” but no progress after 15 minutes. How should they track the update’s completion status?

A.

Check /var/log/messages on the DPU operating system for update logs.

B.

Query the DPU BMC with the Task ID of the installation process.

C.

Power cycle the DPU immediately to force a rollback.

D.

Run bfrec --status on the DPU to view flash progress.

During cluster validation, the Cable Validation Tool (CVT) reports " Underperforming (BER) " for an InfiniBand link. Which BER thresholds indicate a critical signal quality issue requiring cable replacement?

A.

Rx power variance > 3dB between lanes

B.

Effective BER > 0 during the first 125 minutes of link operation

C.

Raw BER > 1e-12 or Effective BER > 1.5E-254 for < 6hr measurements

D.

Temperature > 85°C on transceiver module