Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam

The data architect has decided that once data has been ingested from external sources into the

Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.

The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.

GRANT USAGE ON DATABASE prod TO eng;

GRANT SELECT ON DATABASE prod TO eng;

Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?

A.

Group members have full permissions on the prod database and can also assign permissions to other users or groups.

B.

Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.

C.

Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.

D.

Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.

E.

Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.

An external object storage container has been mounted to the location/mnt/finance_eda_bucket.

The following logic was executed to create a database for the finance team:

After the database was successfully created and permissions configured, a member of the finance team runs the following code:

If all users on the finance team are members of thefinancegroup, which statement describes how thetx_salestable will be created?

A.

A logical table will persist the query plan to the Hive Metastore in the Databricks control plane.

B.

An external table will be created in the storage container mounted to /mnt/finance eda bucket.

C.

A logical table will persist the physical plan to the Hive Metastore in the Databricks control plane.

D.

An managed table will be created in the storage container mounted to /mnt/finance eda bucket.

E.

A managed table will be created in the DBFS root storage container.

The following table consists of items found in user carts within an e-commerce website.

The following MERGE statement is used to update this table using an updates view, with schema evaluation enabled on this table.

How would the following update be handled?

A.

The update is moved to separate ''restored'' column because it is missing a column expected in the target schema.

B.

The new restored field is added to the target schema, and dynamically read as NULL for existing unmatched records.

C.

The update throws an error because changes to existing columns in the target schema are not supported.

D.

The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.

The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will be used to support both data engineering and machine learning workloads. Gold tables will largely serve business intelligence and reporting purposes. While personal identifying information (PII) exists in all tiers of data, pseudonymization and anonymization rules are in place for all data at the silver and gold levels.

The organization is interested in reducing security concerns while maximizing the ability to collaborate across diverse teams.

Which statement exemplifies best practices for implementing this system?

A.

Isolating tables in separate databases based on data quality tiers allows for easy permissions management through database ACLs and allows physical separation of default storage locations for managed tables.

B.

Because databases on Databricks are merely a logical construct, choices around database organization do not impact security or discoverability in the Lakehouse.

C.

Storinq all production tables in a single database provides a unified view of all data assets available throughout the Lakehouse, simplifying discoverability by granting all users view privileges on this database.

D.

Working in the default Databricks database provides the greatest security when working with managed tables, as these will be created in the DBFS root.

E.

Because all tables must live in the same storage containers used for the database they're created in, organizations should be prepared to create between dozens and thousands of databases depending on their data isolation requirements.

A data engineer wants to reflector the following DLT code, which includes multiple definition with very similar code:

In an attempt to programmatically create these tables using a parameterized table definition, the data engineer writes the following code.

The pipeline runs an update with this refactored code, but generates a different DAG showing incorrect configuration values for tables.

How can the data engineer fix this?

A.

Convert the list of configuration values to a dictionary of table settings, using table names as keys.

B.

Convert the list of configuration values to a dictionary of table settings, using different input the for loop.

C.

Load the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.

D.

Wrap the loop inside another table definition, using generalized names and properties to replace with those from the inner table

A transactions table has been liquid clustered on the columns product_id, user_id, and event_date.

Which operation lacks support for cluster on write?

A.

spark.writestream.format('delta').mode('append')

B.

CTAS and RTAS statements

C.

INSERT INTO operations

D.

spark.write.format('delta').mode('append')

Which distribution does Databricks support for installing custom Python code packages?

A.

sbt

B.

CRAN

C.

CRAM

D.

nom

E.

Wheels

F.

jars

Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.

Which statement describes a main benefit that offset this additional effort?

A.

Improves the quality of your data

B.

Validates a complete use case of your application

C.

Troubleshooting is easier since all steps are isolated and tested individually

D.

Yields faster deployment and execution times

E.

Ensures that all steps interact correctly to achieve the desired end result