Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Amazon Web Services MLA-C01 - AWS Certified Machine Learning Engineer - Associate

Page: 3 / 8
Total 241 questions

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

A.

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

B.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

C.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

D.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team ' s permissions.

Which solution will meet this requirement?

A.

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

A.

Use Amazon Made to categorize the sensitive data.

B.

Prepare the data by using AWS Glue DataBrew.

C.

Run an AWS Batch job to change the sensitive data to random values.

D.

Run an Amazon EMR job to change the sensitive data to random values.

A company has an ML model in Amazon SageMaker AI. An ML engineer needs to implement a monitoring solution to automatically detect changes in the input data distribution of model features.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Configure SageMaker Model Monitor. Establish a data quality baseline. Ensure that the emit_metrics option is enabled in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in specific metrics that are related to data quality.

B.

Configure SageMaker Model Monitor. Establish a model quality baseline. Ensure that the comparison_method option is set to Robust in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in model quality metrics.

C.

Use SageMaker Debugger with custom rules to track shifts in feature distributions. Configure Amazon CloudWatch alarms to notify the company when the rules detect significant changes.

D.

Use Amazon CloudWatch to directly observe the SageMaker AI endpoint ' s performance metrics. Manually analyze the CloudWatch logs for indicators of data drift or shifts in feature distribution.

A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support.

Which modeling approach should the company use to meet this requirement?

A.

Anomaly detection

B.

Linear regression

C.

Logistic regression

D.

Semantic segmentation

A company needs to create a central catalog for all the company ' s ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories.

Which solution will meet these requirements?

A.

Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B.

Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

C.

Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts.

D.

Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

An ML engineer is using Amazon SageMaker AI to train an ML model. The ML engineer needs to use SageMaker AI automatic model tuning (AMT) features to tune the model hyperparameters over a large parameter space.

The model has 20 categorical hyperparameters and 7 continuous hyperparameters that can be tuned. The ML engineer needs to run the tuning job a maximum of 1,000 times. The ML engineer must ensure that each parameter trial is built based on the performance of the previous trial.

Which solution will meet these requirements?

A.

Define the search space as categorical parameters of 1,000 possible combinations. Use grid search.

B.

Define the search space as continuous parameters. Use random search. Set the maximum number of tuning jobs to 1,000.

C.

Define the search space as categorical parameters and continuous parameters. Use Bayesian optimization. Set the maximum number of training jobs to 1,000.

D.

Define the search space as categorical parameters and continuous parameters. Use grid search. Set the maximum number of tuning jobs to 1,000.

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account Ð’ in the same Region.

Which solution will meet this requirement with the LEAST development effort?

A.

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

C.

Use AWS DataSync to replicate the model from Account A to Account B.

D.

Create an AWS Site-to-Site VPN connection between Account A and Account Ð’ to transfer the model.

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

A.

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

B.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

C.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

D.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

An ML engineer is training a simple neural network model. The ML engineer tracks the performance of the model over time on a validation dataset. The model ' s performance improves substantially at first and then degrades after a specific number of epochs.

Which solutions will mitigate this problem? (Choose two.)

A.

Enable early stopping on the model.

B.

Increase dropout in the layers.

C.

Increase the number of layers.

D.

Increase the number of neurons.

E.

Investigate and reduce the sources of model bias.