Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

A.

dbfs:/user/hive/database/customer360

B.

dbfs:/user/hive/warehouse

C.

dbfs:/user/hive/customer360

D.

More information is needed to determine the correct response

Which of the following Git operations must be performed outside of Databricks Repos?

A.

Commit

B.

Pull

C.

Push

D.

Clone

E.

Merge

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

A.

GRANT USAGE ON DATABASE customers TO team;

B.

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

A.

The STREAM function is not needed and will cause an error.

B.

The table being created is a live table.

C.

The customers table is a streaming live table.

D.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

E.

The data in the customers table has been updated since its last run.

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

A.

pyspark.sql.types.DateType

B.

datetime

C.

pyspark.sql.types.TimestampType

D.

Cron syntax

E.

There is no way to represent and submit this information programmatically

Which two components function in the DB platform architecture’s control plane? (Choose two.)

A.

Virtual Machines

B.

Compute Orchestration

C.

Serverless Compute

D.

Compute

E.

Unity Catalog

In which of the following file formats is data from Delta Lake tables primarily stored?

A.

Delta

B.

CSV

C.

Parquet

D.

JSON

E.

A proprietary, optimized format specific to Databricks

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E