Pre-Summer Sale Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam

A senior data engineer is planning large-scale data workflows. The task is to identify the considerations that form a foundation for creating scalable data models for managing large datasets. The team has listed Delta Lake capabilities and wants to determine which feature should not be considered as a core factor.

Which key feature can be ignored while evaluating Delta Lake?

A.

Delta Lake’s ability to process data in both batch and streaming modes seamlessly, providing flexibility in ingestion and processing.

B.

Delta Lake works with various data formats (Parquet, JSON, CSV) and integrates well with Spark and Databricks tools.

C.

Delta Lake optimizes metadata handling, efficiently managing billions of files and facilitating scalability to petabyte-scale datasets.

D.

Delta Lake provides limited support for monitoring and troubleshooting data pipelines, so relevant partner tools have to be identified and set up for enhanced operational efficiency.

Which statement regarding spark configuration on the Databricks platform is true?

A.

Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.

B.

When the same spar configuration property is set for an interactive to the same interactive cluster.

C.

Spark configuration set within an notebook will affect all SparkSession attached to the same interactive cluster

D.

The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs.

A data engineer has configured their Databricks Asset Bundle with multiple targets in databricks.yml and deployed it to the production workspace. Now, to validate the deployment, they need to invoke a job named my_project_job specifically within the prod target context. Assuming the job is already deployed, they need to trigger its execution while ensuring the target-specific configuration is respected.

Which command will trigger the job execution?

A.

databricks execute my_project_job -e prod

B.

databricks job run my_project_job --env prod

C.

databricks run my_project_job -t prod

D.

databricks bundle run my_project_job -t prod

A transactions table has been liquid clustered on the columns product_id, user_id, and event_date.

Which operation lacks support for cluster on write?

A.

spark.writestream.format( ' delta ' ).mode( ' append ' )

B.

CTAS and RTAS statements

C.

INSERT INTO operations

D.

spark.write.format( ' delta ' ).mode( ' append ' )

A healthcare analytics team is implementing a dimensional model in Delta Lake for patient care analysis. They have a date dimension table and are evaluating design options to ensure it supports a wide range of time-based analyses.

Which design approach for the date dimension will support efficient time-based querying and aggregation?

A.

Store the date as a string in the format YYYY-MM-DD for readability.

B.

Create separate dimension tables for different calendar systems (fiscal, academic, etc.).

C.

Store only the date value and calculate all time attributes dynamically in queries.

D.

Pre-calculate attributes like fiscal_period, quarter, month_name, day_of_week, and holiday.

Assuming that the Databricks CLI has been installed and configured correctly, which Databricks CLI command can be used to upload a custom Python Wheel to object storage mounted with the DBFS for use with a production job?

A.

configure

B.

fs

C.

jobs

D.

libraries

E.

workspace

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.

Which describes how Delta Lake can help to avoid data loss of this nature in the future?

A.

The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.

B.

Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.

C.

Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.

D.

Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.

E.

Ingestine all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.

The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs Ul. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.

What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?

A.

Can manage

B.

Can edit

C.

Can run

D.

Can Read

Which statement describes the correct use of pyspark.sql.functions.broadcast?

A.

It marks a column as having low enough cardinality to properly map distinct values to available partitions, allowing a broadcast join.

B.

It marks a column as small enough to store in memory on all executors, allowing a broadcast join.

C.

It caches a copy of the indicated table on attached storage volumes for all active clusters within a Databricks workspace.

D.

It marks a DataFrame as small enough to store in memory on all executors, allowing a broadcast join.

E.

It caches a copy of the indicated table on all nodes in the cluster for use in all future queries during the cluster lifetime.

A data engineer has a Delta table orders with deletion vectors enabled. The engineer executes the following command:

DELETE FROM orders WHERE status = ' cancelled ' ;

What should be the behavior of deletion vectors when the command is executed?

A.

Rows are marked as deleted both in metadata and in files.

B.

Delta automatically removes all cancelled orders permanently.

C.

Files are physically rewritten without the deleted rows.

D.

Rows are marked as deleted in metadata, not in files.