Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Amazon Web Services MLA-C01 - AWS Certified Machine Learning Engineer - Associate

Page: 4 / 7
Total 207 questions

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

D.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

A company wants to improve the sustainability of its ML operations.

Which actions will reduce the energy usage and computational resources that are associated with the company's training jobs? (Choose two.)

A.

Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

B.

Use Amazon SageMaker Ground Truth for data labeling.

C.

Deploy models by using AWS Lambda functions.

D.

Use AWS Trainium instances for training.

E.

Use PyTorch or TensorFlow with the distributed training option.

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

A.

Deploy on EC2 Auto Scaling behind an ALB.

B.

Deploy to a SageMaker AI real-time endpoint.

C.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

D.

Deploy to Amazon ECS on EC2.

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

A.

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor's name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

A.

Use Spot Instances to increase the number of primary nodes.

B.

Use Spot Instances to increase the number of core nodes.

C.

Use Spot Instances to increase the number of task nodes.

D.

Use On-Demand Instances to increase the number of core nodes.

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

A.

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

B.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

C.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

D.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The dataset contains two string variables: location and job_seniority_level.

The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

A.

Apply tokenization to location. Apply ordinal encoding to job_seniority_level.

B.

Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level.

C.

Apply binning to location. Apply standard scaling to job_seniority_level.

D.

Apply one-hot encoding to location. Apply standard scaling to job_seniority_level.

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

A.

Perform ordinal encoding to represent categories of the feature.

B.

Perform similarity encoding to represent categories of the feature.

C.

Perform one-hot encoding to represent categories of the feature.

D.

Perform target encoding to represent categories of the feature.

A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

A.

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

B.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

C.

Create a SageMaker processing job by calling the SageMaker Python SDK.

D.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.