Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Amazon Web Services Data-Engineer-Associate - AWS Certified Data Engineer - Associate (DEA-C01)

A data engineer uses Amazon Kinesis Data Streams to ingest and process records that contain user behavior data from an application every day.

The data engineer notices that the data stream is experiencing throttling because hot shards receive much more data than other shards in the data stream.

How should the data engineer resolve the throttling issue?

A.

Use a random partition key to distribute the ingested records.

B.

Increase the number of shards in the data stream. Distribute the records across the shards.

C.

Limit the number of records that are sent each second by the producer to match the capacity of the stream.

D.

Decrease the size of the records that the producer sends to match the capacity of the stream.

A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models.

The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio.

Which change should the engineer make to gain access to SageMaker Studio?

A.

Add the AWSGlueServiceRole managed policy to the data engineer's IAM user.

B.

Add a policy to the data engineer's IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.

C.

Add the AmazonSageMakerFullAccess managed policy to the data engineer's IAM user.

D.

Add a policy to the data engineer's IAM user that allows the sts:AddAssociation action for the AWS Glue and SageMaker service principals in the trust policy.

An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.

The company wants to optimize costs for data storage and retrieval.

Which solution will meet this requirement?

A.

Transition the data to Amazon S3 Glacier Flexible Retrieval.

B.

Transition the data from Amazon S3 to an Amazon Aurora cluster.

C.

Configure AWS Glue ETL to transform the incoming data to Apache Parquet format.

D.

Configure AWS Glue ETL to use Amazon EMR to process incoming data in parallel.

A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.

Which solution will meet this requirement MOST cost-effectively?

A.

Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.

B.

Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.

C.

Use Amazon Athena Federated Query to join the data from all data sources.

D.

Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.

Files from multiple data sources arrive in an Amazon S3 bucket on a regular basis. A data engineer wants to ingest new files into Amazon Redshift in near real time when the new files arrive in the S3 bucket.

Which solution will meet these requirements?

A.

Use the query editor v2 to schedule a COPY command to load new files into Amazon Redshift.

B.

Use the zero-ETL integration between Amazon Aurora and Amazon Redshift to load new files into Amazon Redshift.

C.

Use AWS Glue job bookmarks to extract, transform, and load (ETL) load new files into Amazon Redshift.

D.

Use S3 Event Notifications to invoke an AWS Lambda function that loads new files into Amazon Redshift.

A data engineer needs to use AWS Step Functions to design an orchestration workflow. The workflow must parallel process a large collection of data files and apply a specific transformation to each file.

Which Step Functions state should the data engineer use to meet these requirements?

A.

Parallel state

B.

Choice state

C.

Map state

D.

Wait state

A data engineer needs to create an Amazon Athena table based on a subset of data from an existing Athena table named cities_world. The cities_world table contains cities that are located around the world. The data engineer must create a new table named cities_us to contain only the cities from cities_world that are located in the US.

Which SQL statement should the data engineer use to meet this requirement?

A.

Option A

B.

Option B

C.

Option C

D.

Option D

A company has a data lake in Amazon S3. The company collects AWS CloudTrail logs for multiple applications. The company stores the logs in the data lake, catalogs the logs in AWS Glue, and partitions the logs based on the year. The company uses Amazon Athena to analyze the logs.

Recently, customers reported that a query on one of the Athena tables did not return any data. A data engineer must resolve the issue.

Which combination of troubleshooting steps should the data engineer take? (Select TWO.)

A.

Confirm that Athena is pointing to the correct Amazon S3 location.

B.

Increase the query timeout duration.

C.

Use the MSCK REPAIR TABLE command.

D.

Restart Athena.

E.

Delete and recreate the problematic Athena table.

A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The application has very low usage during weekends.

The company must ensure that the application performs consistently during peak usage times.

Which solution will meet these requirements in the MOST cost-effective way?

A.

Increase the provisioned capacity to the maximum capacity that is currently present during peak load times.

B.

Divide the table into two tables. Provision each table with half of the provisioned capacity of the original table. Spread queries evenly across both tables.

C.

Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak usage times. Schedule lower capacity during off-peak times.

D.

Change the capacity mode from provisioned to on-demand. Configure the table to scale up and scale down based on the load on the table.

A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts.

Which solution will meet these requirements with the LEAST operational effort?

A.

Create a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves.

B.

Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.

C.

Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.

D.

Load the data into Amazon Redshift. Create a view for each country. Create separate 1AM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.