Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

A data engineer needs to create a table in Databricks using data from their organization ' s existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url " jdbc:sqlite:/customers.db " , dbtable " customer360 "

)

Which line of code fills in the above blank to successfully complete the task?

A.

autoloader

B.

org.apache.spark.sql.jdbc

C.

sqlite

D.

org.apache.spark.sql.sqlite

A data engineer wants to create an external table in Databricks that references data stored in an Azure Data Lake Storage (ADLS) location. The goal is to enable Databricks to access and query this external data without moving it into Databricks-managed storage.

Which step should the data engineer take to successfully create the external table?

A.

Use the CREATE TABLE statement and specify the LOCATION clause with the path to the external data.

B.

Use the CREATE UNMANAGED TABLE statement without specifying a LOCATION clause.

C.

Use the CREATE EXTERNAL TABLE statement without specifying a LOCATION clause.

D.

Use the CREATE MANAGED TABLE statement and specify the LOCATION clause with the path to the external data.

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Which code will generate the expected result?

A.

region_sales = sales_df.groupBy( " region " ).agg(sum( " sales_amountM).alias( " total_sales_amount " ))

B.

region_sales = sales_df. sum ( " salen_aiTiount " ) . groupBy ( " region " ) .alias ( " total_sale3_amount " )

C.

region_sales= sales_df.groupBy( " category " ).sum(nsales_amount " ).alias( " t_otal_sales_amounl " )

D.

region sales - sales_df.agg(sum( " sales_amount " ).groupBy( " region " ).alias( " total sales amount " ))

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

A.

PIVOT

B.

CONVERT

C.

WHERE

D.

TRANSFORM

E.

SUM

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark Ul and can see the ERRORS as follows:

Which two corrective actions should the data engineer perform to resolve this issue?

Choose 2 answers - (Q) Narrow the filters in order to collect less data in the query

A.

Upsize the worker nodes and activate autoshuffle partitions

B.

Upsize the driver node and deactivate autoshuffle partitions

C.

Cache the dataset in order to boost the query performance

D.

Fix the shuffle partitions to 50 to ensure the allocation

Which of the following tools is used by Auto Loader process data incrementally?

A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

A.

Worker node

B.

JDBC data source

C.

Databricks web application

D.

Databricks Filesystem

E.

Driver node

Which TWO items are characteristics of the Gold Layer?

Choose 2 answers

A.

Read-optimized

B.

Normalised

C.

Raw Data

D.

Historical lineage

E.

De-normalised

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

They can turn on the Auto Stop feature for the SQL endpoint.

B.

They can ensure the dashboard ' s SQL endpoint is not one of the included query ' s SQL endpoint.

C.

They can reduce the cluster size of the SQL endpoint.

D.

They can ensure the dashboard ' s SQL endpoint matches each of the queries ' SQL endpoints.

E.

They can set up the dashboard ' s SQL endpoint to be serverless.

An organization needs to share a dataset stored in its Databricks Unity Catalog with an external partner who uses a different data platform that is not Databricks. The goal is to maintain data security and ensure the partner can access the data efficiently.

Which method should the data engineer use to securely share the dataset with the external partner?

A.

Using Delta Sharing with the open sharing protocol

B.

Exporting data as CSV files and emailing them

C.

Using a third-party API to access the Delta table

D.

Databricks-to-Databricks Sharing