New Year Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Amazon Web Services Data-Engineer-Associate - AWS Certified Data Engineer - Associate (DEA-C01)

A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1.

The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs.

What is the likely reason the AWS Glue job is reprocessing the files?

A.

The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.

B.

The maximum concurrency for the AWS Glue job is set to 1.

C.

The data engineer incorrectly specified an older version of AWS Glue for the Glue job.

D.

The AWS Glue job does not have a required commit statement.

A data engineer is designing a new data lake architecture for a company. The data engineer plans to use Apache Iceberg tables and AWS Glue Data Catalog to achieve fast query performance and enhanced metadata handling. The data engineer needs to query historical data for trend analysis and optimize storage costs for a large volume of event data.

Which solution will meet these requirements with the LEAST development effort?

A.

Store Iceberg table data files in Amazon S3 Intelligent-Tiering.

B.

Define partitioning schemes based on event type and event date.

C.

Use AWS Glue Data Catalog to automatically optimize Iceberg storage.

D.

Run a custom AWS Glue job to compact Iceberg table data files.

A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake. The company uses AWS Glue with an AWS Glue Data Catalog.

A data engineer needs to publish AWS Glue Data Quality scores to the Amazon DataZone portal.

Which solution will meet this requirement?

A.

Create a data quality ruleset with Data Quality Definition Language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.

B.

Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.

C.

Create a data quality ruleset with Data Quality Definition Language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.

D.

Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.

A healthcare company uses Amazon Kinesis Data Streams to stream real-time health data from wearable devices, hospital equipment, and patient records.

A data engineer needs to find a solution to process the streaming data. The data engineer needs to store the data in an Amazon Redshift Serverless warehouse. The solution must support near real-time analytics of the streaming data and the previous day's data.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Load data into Amazon Kinesis Data Firehose. Load the data into Amazon Redshift.

B.

Use the streaming ingestion feature of Amazon Redshift.

C.

Load the data into Amazon S3. Use the COPY command to load the data into Amazon Redshift.

D.

Use the Amazon Aurora zero-ETL integration with Amazon Redshift.

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.

A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.

The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.

Which solution will meet these requirements?

A.

Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

B.

Change the distribution key to the table column that has the largest dimension.

C.

Upgrade the reserved node from ra3.4xlarqe to ra3.16xlarqe.

D.

Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.

The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use the AWS Glue API to manually update the Data Catalog.

B.

Run an MSCK REPAIR TABLE command in Athena.

C.

Schedule an AWS Glue crawler to periodically update the Data Catalog.

D.

Run a REFRESH TABLE command in Athena.

A data engineer is building a new data pipeline that stores metadata in an Amazon DynamoDB table. The data engineer must ensure that all items that are older than a specified age are removed from the DynamoDB table daily.

Which solution will meet this requirement with the LEAST configuration effort?

A.

Enable DynamoDB TTL on the DynamoDB table. Adjust the application source code to set the TTL attribute appropriately.

B.

Create an Amazon EventBridge rule that uses a daily cron expression to trigger an AWS Lambda function to delete items that are older than the specified age.

C.

Add a lifecycle configuration to the DynamoDB table that deletes items that are older than the specified age.

D.

Create a DynamoDB stream that has an AWS Lambda function that reacts to data modifications. Configure the Lambda function to delete items that are older than the specified age.

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

Which solution will meet this requirement with the LEAST coding effort?

A.

Use AWS Glue DataBrew to read the files. Use the NEST TO ARRAY transformation to create the new column.

B.

Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.

C.

Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.

D.

Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

A company uses AWS Glue jobs to implement several data pipelines. The pipelines are critical to the company.

The company needs to implement a monitoring mechanism that will alert stakeholders if the pipelines fail.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Create an Amazon EventBridge rule to match AWS Glue job failure events. Configure the rule to target an AWS Lambda function to process events. Configure the function to send notifications to an Amazon Simple Notification Service (Amazon SNS) topic.

B.

Configure an Amazon CloudWatch Logs log group for the AWS Glue jobs. Create an Amazon EventBridge rule to match new log creation events in the log group. Configure the rule to target an AWS Lambda function that reads the logs and sends notifications to an Amazon Simple Notification Service (Amazon SNS) topic if AWS Glue job failure logs are present.

C.

Create an Amazon EventBridge rule to match AWS Glue job failure events. Define an Amazon CloudWatch metric based on the EventBridge rule. Set up a CloudWatch alarm based on the metric to send notifications to an Amazon Simple Notification Service (Amazon SNS) topic.

D.

Configure an Amazon CloudWatch Logs log group for the AWS Glue jobs. Create an Amazon EventBridge rule to match new log creation events in the log group. Configure the rule to send notifications to an Amazon Simple Notification Service (Amazon SNS) topic.

A company stores sensitive transaction data in an Amazon S3 bucket. A data engineer must implement controls to prevent accidental deletions.

A.

Enable versioning on the S3 bucket and configure MFA delete.

B.

Configure an S3 bucket policy rule that denies the creation of S3 delete markers.

C.

Create an S3 Lifecycle rule that moves deleted files to S3 Glacier Deep Archive.

D.

Set up AWS Config remediation actions to prevent users from deleting S3 objects.