Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam

A data engineer is migrating pipeline tasks to reduce operational toil. The workspace uses Unity Catalog and is in a region that supports serverless. The engineer wants Databricks to auto-select instance types, manage scaling, apply Photon, and handle runtime upgrades automatically for job runs.

How should the data engineer meet this requirement while adhering to Databricks constraints?

A.

Use a Pro SQL warehouse and schedule Python notebook tasks to execute as pipeline steps.

B.

Use an all-purpose cluster with cluster policies to enforce standard sizes and enable autoscaling.

C.

Create a job with a single-task job cluster and manually set the instance families and minimum/maximum workers.

D.

Run the job on a serverless compute for workflows configuration, ensuring Unity Catalog is enabled and regional support is available.

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.

Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?

A)

B)

C)

D)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

A data engineer is decommissioning a sandbox schema in Unity Catalog. Some tables are ephemeral staging outputs that can be safely removed entirely, but a few tables point at shared cloud storage used by downstream jobs outside Databricks. The engineer must avoid deleting any shared files when cleaning up catalog objects.

How does Unity Catalog behave when dropping Managed vs External tables?

A.

Drop all tables; Databricks will only remove metadata for both managed and external tables

B.

Drop managed tables that are ephemeral and drop external tables; files for both remain for 7 days

C.

Drop managed staging tables to remove data and metadata, and drop external tables to remove only metadata

D.

Drop external tables first to delete their files, then drop managed tables to keep their files for recovery

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

A.

Jobs

B.

Dashboards

C.

Catalog Explorer

D.

Repos

What is the primary function of the Silver layer in the Databricks medallion architecture?

A.

lngest raw data in its original state

B.

Validate, clean, and deduplicate data for further processing

C.

Aggregate and enrich data for business analytics

D.

Store historical data solely for auditing purposes

A data engineer needs to combine sales data from an on-premises PostgreSQL database with customer data in Azure Synapse for a comprehensive report. The goal is to avoid data duplication and ensure up-to-date information

How should the data engineer achieve this using Databricks?

A.

Develop custom ETL pipelines to ingest data into Databricks

B.

Use Lakehouse Federation to query both data sources directly

C.

Manually synchronize data from both sources into a single database

D.

Export data from both sources to CSV files and upload them to Databricks

A data engineer is working on a Databricks project that utilizes cloud storage. The data engineer wants to load several json files from containers on a storage account as soon as the file arrives within the storage account.

Which syntax should the data engineer follow to first load the files into a dataframe and check that it is working as expected using Python?

A.

df = spark.readStream.format( " json " ).load( " input/path " )

B.

df = spark.readStream.format( " cloud " ),option( " json " ).load( " /input/path " )

C.

df = spark.readStream.format( " cloudFiles " ) .option( " cloudFiles.format " , " json " ) .load( " /input/path " )

D.

df = spark.read.json( " inp i./path " )

A Python file is ready to go into production and the client wants to use the cheapest but most efficient type of cluster possible. The workload is quite small, only processing 10GBs of data with only simple joins and no complex aggregations or wide transformations.

Which cluster meets the requirement?

A.

Job cluster with Photon enabled

B.

Interactive cluster

C.

Job cluster with spot instances disabled

D.

Job cluster with spot instances enabled

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

A.

Databricks Asset Bundles (DAB) + GitHub Integration

B.

Maintain workflow_config.j son and deploy it using Databricks CLI

C.

Manually create and manage the workflow in Ul

D.

Maintain workflow_conf ig. json and deploy it using Terraform