Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access Download Demo

Page: 2 / 5
Total 153 questions

A data engineer at a company that uses Databricks with Unity Catalog needs to share a collection of tables with an external partner who also uses a Databricks workspace enabled for Unity Catalog. The data engineer decides to use Delta Sharing to accomplish this.

What is the first piece of information the data engineer should request from the external partner to set up Delta Sharing?

Their Databricks account password

The name of their Databricks cluster

The IP address of their Databricks workspace

The sharing identifier of their Unity Catalog metastore

Question # 12

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

It is not possible to use SQL in a Python notebook

They can attach the cell to a SQL endpoint rather than a Databricks cluster

They can simply write SQL syntax in the cell

They can add %sql to the first line of the cell

They can change the default language of the notebook to SQL

Question # 13

A data engineer is setting up access control in Unity Catalog and needs to ensure that a group of data analysts can query tables but not modify data.

Which permission should the data engineer grant to the data analysts?

SELECT

INSERT

MODIFY

ALL PRIVILEGES

Question # 14

Which of the following must be specified when creating a new Delta Live Tables pipeline?

A key-value pair configuration

The preferred DBU/hour cost

A path to cloud storage location for the written data

A location of a target database for the written data

At least one notebook library to be executed

Question # 15

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Question # 16

A global retail company sells products across multiple categories (e.g.. Electronics, Clothing) and regions (e.g.. North. South, East. West). The sales team has provided the data engineer with a PySpark dataframe named sales_df as below and the team wants the data engineer to analyze the sales data to help them make strategic decisions.

Category_sales = sales df.groupBy("category").agg(sum("sales amount") .alias ("total sales amount"))

Category_sales = sales_df.sum("3ales_amount"). g-1- upBy("categcryn).alias("toLal_sales_amount))

Category_sale: .es df -agg (sum ("sales amount") .-;r*i:rRy ("category") .alias ("total sa.en amount"))

Category_sales = sales_df.groupBy("reqion"). agq(sum("sales_amountn).alias(ntotal_sales_amount''))

Question # 17

A data engineer needs to conduct Exploratory Analysis on data residing in a database that is within the company's custom-defined network in the cloud. The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Serverless compute for notebooks

Serverless SQL Warehouse

Classic SQL Warehouse

Pro SQL Warehouse

Question # 18

What is the maximum output supported by a job cluster to ensure a notebook does not fail?

10MBS

25MBS

30MBS

15MBS

Question # 19

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

Parquet files can be partitioned

CREATE TABLE AS SELECT statements cannot be used on files

Parquet files have a well-defined schema

Parquet files have the ability to be optimized

Parquet files will become Delta tables

Question # 20

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

New Year Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation: