Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

NVIDIA NCP-AAI - NVIDIA Agentic AI

Page: 1 / 4
Total 121 questions

An AI agent must interact with multiple external services, handle variable user requests, and maintain reliable operation in production.

Which design principle is most critical for ensuring stable and resilient integration with external systems?

A.

Bypassing error handling to reduce latency during API calls

B.

Implementing timeouts and circuit breakers for external service calls

C.

Storing all external credentials directly in the agent’s source code

D.

Using hardcoded endpoints without configuration management

A development team is building an AI agent capable of autonomously planning and executing multi-step tasks while retaining context and learning from past interactions.

Which practice is most important to enable the agent to effectively manage long-term memory and complex tasks?

A.

Implement memory mechanisms for context retention and apply chain-of-thought prompts to enhance reasoning.

B.

Use basic rule-based decision methods that emphasize fast responses over adaptive planning.

C.

Apply short-term memory approaches that handle each interaction independently of previous ones.

D.

Reduce planning features and memory management to keep the system streamlined.

A healthcare AI company is deploying diagnostic agents that process medical imaging and patient data. The system must deliver consistent sub-100ms inference times for critical diagnoses while supporting deployment across multiple hospital sites with different NVIDIA GPU configurations (from RTX 6000 workstations to DGX systems). The agents need to maintain high accuracy while being portable across different hardware environments and capable of running efficiently on various GPU memory configurations.

Which optimization strategy would deliver the BEST performance improvements while maintaining deployment flexibility across diverse NVIDIA hardware configurations?

A.

Deploy agents with NVIDIA CUDA-optimized Docker containers using a sequential inference architecture that processes each layer individually with GPU-to-CPU memory transfers between operations to avoid memory issues.

B.

Deploy agents using NVIDIA NIM containers with CPU-optimized inference to avoid GPU memory constraints and ensure consistent performance across different hospital infrastructure configurations.

C.

Deploy models using NVIDIA TensorRT optimization in their original FP32 precision format without any quantization or memory optimization, requiring 32GB+ GPU memory across all deployment sites.

D.

Deploy agents using model optimizations with post-training quantization with Nvidia NIM deployment for portable performance across different GPU platforms and memory configurations.

You’re building a RAG system that uses RAG Fusion.

Which of the following approaches would be most effective in determining how to combine information from multiple retrieved chunks?

A.

Filtering out chunks considered inconsistent with others before presenting information to the LLM.

B.

Using the LLM to automatically identify the most important sentences within each chunk and combine them.

C.

Manually selecting the most relevant sentences from each chunk and inserting them into the LLM prompt.

D.

Concatenating the text from all retrieved chunks into a single block to form the response.

After a series of adjustments in a supply chain agentic system, the agent has dramatically reduced shipping times and minimized costs, but the team is receiving a high volume of complaints from customers regarding delayed deliveries.

Which metric is MOST important to prioritize when investigating this situation?

A.

The agent’s ability to predict future demand fluctuations, as accurate forecasting is crucial for effective logistics.

B.

The total cost savings achieved through the agent’s optimization, which represents a significant financial benefit.

C.

The percentage of delivery times that fall within the acceptable delay window, considering customer satisfaction as a key factor.

D.

The agent’s adherence to the prescribed delivery schedules, as it’s demonstrably improving efficiency.

When analyzing inconsistent performance across a fleet of customer service agents handling similar queries, which evaluation approach most effectively identifies root causes and optimization opportunities?

A.

Assess performance data from recently improved agents and highlight strong results, using outcome comparisons to identify areas with the greatest impact on service quality.

B.

Average performance metrics across all agents as this will smooth individual variations, query distribution differences, and temporal factors affecting agent behavior and accuracy.

C.

Deploy stratified evaluation sampling across agent variants, query complexity levels, and temporal patterns while tracking decision paths using comparative analytics.

D.

Review performance across both high- and low-accuracy agent groups, comparing case outcomes and identifying patterns contributing to top and bottom results.

You are tasked with deploying a multi-modal agentic system that must respond to user queries with minimal latency while maintaining guardrails for safe and context-aware interactions.

Which of the following configurations best leverages NVIDIA’s AI stack to meet these requirements?

A.

Integrate NeMo Guardrails, configure NIM microservices for optimized inference, use TensorRT-LLM for deployment, and profile the system using Triton Inference Server with multi-modal support.

B.

Integrate NeMo Guardrails, use Omniverse to generate synthetic data, configure NIM microservices for optimized inference, use TensorRT-LLM for deployment, and profile the system using NeMo Agent Toolkit for multi-modal support.

C.

Use NeMo Guardrails for safety, deploy the model with Triton Inference Server using default settings, and rely on hardware accelerators like GPU/TPU inference for cost efficiency.

D.

Use NIM microservices for deployment, optionally use NeMo Guardrails unless one wants to minimize the inference overhead.

You are building an agent that performs financial analysis by retrieving and processing structured data from a client’s internal SQL database. The agent must handle occasional connection errors and retry the query up to a few times before failing gracefully.

Which approach best meets these requirements?

A.

Use structured tool calls with built-in retry handling and timed delays inside the tool wrapper

B.

Use few-shot prompting to guide the agent’s conversation flow and manually retry failed API responses

C.

Use a reactive agent pattern that retries the query after a user confirms a retry attempt

D.

Use memory to track the number of failed attempts and apply it in later retries

You are developing an agent that needs to perform a complex set of tasks repeatedly.

Why is periodic fine-tuning an important aspect of long-term knowledge retention for this type of agent?

A.

It prevents the agent from becoming overly specialized to a single task.

B.

It eliminates the need for external storage like RAG.

C.

It prevents the agent from forgetting past successes and failures.

D.

It guarantees the agent will produce the same output for the same input.

What benefits does a Kubernetes deployment offer over Slurm?

A.

Kubernetes provides autoscaling, auto-restarts, dynamic task scheduling, error isolation with containers, and integrated monitoring.

B.

Kubernetes is the best option for both training and inference, offering advantages for resource management and workload visibility over traditional HPC schedulers like Slurm.

C.

Kubernetes is more optimized for batch jobs to achieve high throughput, and also provides for monitoring and failover in large-scale workloads.