New Year Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

A data engineer needs to parse only png files in a directory that contains files with different suffixes. Which code should the data engineer use to achieve this task?

A)

B)

C)

D)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

A.

GRANT ALL PRIVILEGES ON TABLE sales TO team;

B.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

C.

GRANT SELECT ON TABLE sales TO team;

D.

GRANT USAGE ON TABLE sales TO team;

E.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

A.

Unity Catalog

B.

Delta Lake

C.

Databricks SQL

D.

Data Explorer

E.

Auto Loader

Which of the following describes the type of workloads that are always compatible with Auto Loader?

A.

Dashboard workloads

B.

Streaming workloads

C.

Machine learning workloads

D.

Serverless workloads

E.

Batch workloads

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

A.

Databricks Asset Bundles (DAB) + GitHub Integration

B.

Maintain workflow_config.j son and deploy it using Databricks CLI

C.

Manually create and manage the workflow in Ul

D.

Maintain workflow_conf ig. json and deploy it using Terraform

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

A.

dbfs:/user/hive/database/customer360

B.

dbfs:/user/hive/warehouse

C.

dbfs:/user/hive/customer360

D.

More information is needed to determine the correct response

Which TWO items are characteristics of the Gold Layer?

Choose 2 answers

A.

Read-optimized

B.

Normalised

C.

Raw Data

D.

Historical lineage

E.

De-normalised

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job’s current run. The data engineer asks a tech lead for help in identifying why this might be the case.

Which of the following approaches can the tech lead use to identify why the notebook is running slowly as part of the Job?

A.

They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.

B.

They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.

C.

They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.

D.

There is no way to determine why a Job task is running slowly.

E.

They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

A.

CONSTRAINT valid_location EXPECT (location = NULL)

B.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

A data engineer wants to reduce costs and optimize cloud spending. The data engineer has decided to use Databricks Serverless for lowering cloud costs while maintaining existing SLAs.

What is the first step in migrating to Databricks Serverless?

A.

Legacy Ingestion pipelines that include ingestion from sources API's, files, JDBC/ODBC connections

B.

Low frequency Bl Dashboarding and Adhoc SQL Analytics

C.

A frequently running and efficient Python-based data transformation pipeline compatible with the latest Databricks runtime and Unity Catalog

D.

A frequently running and efficient Scala-based data transformation pipeline compatible with the latest Databricks runtime and Unity Catalog