Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access Download Demo

Page: 5 / 6
Total 176 questions

A data engineer needs to create a table in Databricks using data from their organization ' s existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url " jdbc:sqlite:/customers.db " , dbtable " customer360 "

)

Which line of code fills in the above blank to successfully complete the task?

autoloader

org.apache.spark.sql.jdbc

sqlite

org.apache.spark.sql.sqlite

Question # 42

A data engineer wants to create an external table in Databricks that references data stored in an Azure Data Lake Storage (ADLS) location. The goal is to enable Databricks to access and query this external data without moving it into Databricks-managed storage.

Which step should the data engineer take to successfully create the external table?

Use the CREATE TABLE statement and specify the LOCATION clause with the path to the external data.

Use the CREATE UNMANAGED TABLE statement without specifying a LOCATION clause.

Use the CREATE EXTERNAL TABLE statement without specifying a LOCATION clause.

Use the CREATE MANAGED TABLE statement and specify the LOCATION clause with the path to the external data.

Question # 43

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Which code will generate the expected result?

region_sales = sales_df.groupBy( " region " ).agg(sum( " sales_amountM).alias( " total_sales_amount " ))

region_sales = sales_df. sum ( " salen_aiTiount " ) . groupBy ( " region " ) .alias ( " total_sale3_amount " )

region_sales= sales_df.groupBy( " category " ).sum(nsales_amount " ).alias( " t_otal_sales_amounl " )

region sales - sales_df.agg(sum( " sales_amount " ).groupBy( " region " ).alias( " total sales amount " ))

Question # 44

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

PIVOT

CONVERT

WHERE

TRANSFORM

SUM

Question # 45

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark Ul and can see the ERRORS as follows:

Which two corrective actions should the data engineer perform to resolve this issue?

Choose 2 answers - (Q) Narrow the filters in order to collect less data in the query

Upsize the worker nodes and activate autoshuffle partitions

Upsize the driver node and deactivate autoshuffle partitions

Cache the dataset in order to boost the query performance

Fix the shuffle partitions to 50 to ensure the allocation

Question # 46

Which of the following tools is used by Auto Loader process data incrementally?

Checkpointing

Spark Structured Streaming

Data Explorer

Unity Catalog

Databricks SQL

Question # 47

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Worker node

JDBC data source

Databricks web application

Databricks Filesystem

Driver node

Explanation:

The Databricks web application is the user interface that allows you to create and manage workspaces, clusters, notebooks, jobs, and other resources. It is hosted completely in the control plane of the classic Databricks architecture, which includes the backend services that Databricks manages in your Databricks account. The other options are part of the compute plane, which is where your data is processed by compute resources such as clusters. The compute plane is in your own cloud account and network.Â References:Â Databricks architecture overview,Â Security and Trust Center QUESTION NO: 4

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?

A. The ability to manipulate the same data using a variety of languages

B. The ability to collaborate in real time on a single notebook

C. The ability to set up alerts for query failures

D. The ability to support batch and streaming workloads

E. The ability to distribute complex data operations

Answer: D

Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse.Â Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale1.Â Delta Lake supports upserts using the merge operation, which enables you to efficiently update existing data or insert new data into your Delta tables2.Â Delta Lake also provides time travel capabilities, which allow you to query previous versions of your data or roll back to a specific point in time3.Â References:Â 1:Â What is Delta Lake? | Databricks on AWSÂ 2:Â Upsert into a table using merge | Databricks on AWSÂ 3: [Query an older snapshot of a table (time travel) | Databricks on AWS]

Learn more

Question # 48

Which TWO items are characteristics of the Gold Layer?

Choose 2 answers

Read-optimized

Normalised

Raw Data

Historical lineage

De-normalised

Question # 49

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

They can turn on the Auto Stop feature for the SQL endpoint.

They can ensure the dashboard ' s SQL endpoint is not one of the included query ' s SQL endpoint.

They can reduce the cluster size of the SQL endpoint.

They can ensure the dashboard ' s SQL endpoint matches each of the queries ' SQL endpoints.

They can set up the dashboard ' s SQL endpoint to be serverless.

Question # 50

An organization needs to share a dataset stored in its Databricks Unity Catalog with an external partner who uses a different data platform that is not Databricks. The goal is to maintain data security and ensure the partner can access the data efficiently.

Which method should the data engineer use to securely share the dataset with the external partner?

Using Delta Sharing with the open sharing protocol

Exporting data as CSV files and emailing them

Using a third-party API to access the Delta table

Databricks-to-Databricks Sharing

Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is: