Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with one-time notifications.

D.

They can set up an Alert with a new webhook alert destination.

E.

They can set up an Alert without notifications.

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

O They can reduce the cluster size of the SQL endpoint.

B.

Q They can turn on the Auto Stop feature for the SQL endpoint.

C.

O They can set up the dashboard's SQL endpoint to be serverless.

D.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

A.

A data lakehouse provides storage solutions for structured and unstructured data.

B.

A data lakehouse supports ACID-compliant transactions.

C.

A data lakehouse allows the use of SQL queries to examine data.

D.

A data lakehouse stores data in open formats.

E.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Which of the following tools is used by Auto Loader process data incrementally?

A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which of the following code blocks successfully completes this task?

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which of the following approaches can the data engineer take to identify the table that is dropping the records?

A.

They can set up separate expectations for each table when developing their DLT pipeline.

B.

They cannot determine which table is dropping the records.

C.

They can set up DLT to notify them via email when records are dropped.

D.

They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

E.

They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

A.

GRANT VIEW ON CATALOG customers TO team;

B.

GRANT CREATE ON DATABASE customers TO team;

C.

GRANT USAGE ON CATALOG team TO customers;

D.

GRANT CREATE ON DATABASE team TO customers;

E.

GRANT USAGE ON DATABASE customers TO team;

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

A.

The VACUUM command was run on the table

B.

The TIME TRAVEL command was run on the table

C.

The DELETE HISTORY command was run on the table

D.

The OPTIMIZE command was nun on the table

E.

The HISTORY command was run on the table

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

Which of the following can be used to fill in the blank to successfully complete the task?

A.

spark.delta.sql

B.

spark.delta.table

C.

spark.table

D.

dbutils.sql

E.

spark.sql