Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Amazon Web Services MLS-C01 - AWS Certified Machine Learning - Specialty

Page: 8 / 10
Total 330 questions

A large consumer goods manufacturer has the following products on sale:

• 34 different toothpaste variants

• 48 different toothbrush variants

• 43 different mouthwash variants

The entire sales history of all these products is available in Amazon S3. Currently, the company is using custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these products. The company wants to predict the demand for a new product that will soon be launched.

Which solution should a machine learning specialist apply?

A.

Train a custom ARIMA model to forecast demand for the new product.

B.

Train an Amazon SageMaker DeepAR algorithm to forecast demand for the new product.

C.

Train an Amazon SageMaker k-means clustering algorithm to forecast demand for the new product.

D.

Train a custom XGBoost model to forecast demand for the new product.

A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results.

Which modeling approach will deliver the MOST accurate prediction of product quality?

A.

Amazon SageMaker DeepAR forecasting algorithm

B.

Amazon SageMaker XGBoost algorithm

C.

Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm

D.

A convolutional neural network (CNN) and ResNet

An aircraft engine manufacturing company is measuring 200 performance metrics in a time-series. Engineers

want to detect critical manufacturing defects in near-real time during testing. All of the data needs to be stored

for offline analysis.

What approach would be the MOST effective to perform near-real time defect detection?

A.

Use AWS IoT Analytics for ingestion, storage, and further analysis. Use Jupyter notebooks from withinAWS IoT Analytics to carry out analysis for anomalies.

B.

Use Amazon S3 for ingestion, storage, and further analysis. Use an Amazon EMR cluster to carry outApache Spark ML k-means clustering to determine anomalies.

C.

Use Amazon S3 for ingestion, storage, and further analysis. Use the Amazon SageMaker Random CutForest (RCF) algorithm to determine anomalies.

D.

Use Amazon Kinesis Data Firehose for ingestion and Amazon Kinesis Data Analytics Random Cut Forest(RCF) to perform anomaly detection. Use Kinesis Data Firehose to store data in Amazon S3 for furtheranalysis.

A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.

Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)

A.

Amazon SageMaker seq2seq algorithm

B.

Amazon SageMaker BlazingText algorithm in Skip-gram mode

C.

Amazon SageMaker Object2Vec algorithm

D.

Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode

E.

Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)

An obtain relator collects the following data on customer orders: demographics, behaviors, location, shipment progress, and delivery time. A data scientist joins all the collected datasets. The result is a single dataset that includes 980 variables.

The data scientist must develop a machine learning (ML) model to identify groups of customers who are likely to respond to a marketing campaign.

Which combination of algorithms should the data scientist use to meet this requirement? (Select TWO.)

A.

Latent Dirichlet Allocation (LDA)

B.

K-means

C.

Se mantic feg mentation

D.

Principal component analysis (PCA)

E.

Factorization machines (FM)

A company wants to use machine learning (ML) to improve its customer churn prediction model. The company stores data in an Amazon Redshift data warehouse.

A data science team wants to use Amazon Redshift machine learning (Amazon Redshift ML) to build a model and run predictions for new data directly within the data warehouse.

Which combination of steps should the company take to use Amazon Redshift ML to meet these requirements? (Select THREE.)

A.

Define the feature variables and target variable for the churn prediction model.

B.

Use the SQL EXPLAIN_MODEL function to run predictions.

C.

Write a CREATE MODEL SQL statement to create a model.

D.

Use Amazon Redshift Spectrum to train the model.

E.

Manually export the training data to Amazon S3.

F.

Use the SQL prediction function to run predictions,

A data scientist is designing a repository that will contain many images of vehicles. The repository must scale automatically in size to store new images every day. The repository must support versioning of the images. The data scientist must implement a solution that maintains multiple immediately accessible copies of the data in different AWS Regions.

Which solution will meet these requirements?

A.

Amazon S3 with S3 Cross-Region Replication (CRR)

B.

Amazon Elastic Block Store (Amazon EBS) with snapshots that are shared in a secondary Region

C.

Amazon Elastic File System (Amazon EFS) Standard storage that is configured with Regional availability

D.

AWS Storage Gateway Volume Gateway

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

A.

Perform stratified sampling on the original dataset.

B.

Acquire additional data about the majority classes in the original dataset.

C.

Use a smaller, randomly sampled version of the training dataset.

D.

Perform systematic sampling on the original dataset.

A trucking company is collecting live image data from its fleet of trucks across the globe. The data is growing rapidly and approximately 100 GB of new data is generated every day. The company wants to explore machine learning uses cases while ensuring the data is only accessible to specific IAM users.

Which storage option provides the most processing flexibility and will allow access control with IAM?

A.

Use a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users.

B.

Use an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies.

C.

Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies.

D.

Configure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users.

A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input image belongs to each of the 10 classes

Which function will produce the desired output?

A.

Dropout

B.

Smooth L1 loss

C.

Softmax

D.

Rectified linear units (ReLU)