Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

A data analyst has created a Query in Databricks SQL, and now they want to create two data visualizations from that Query and add both of those data visualizations to the same Databricks SQL Dashboard.

Which of the following steps will they need to take when creating and adding both data visualizations to the Databricks SQL Dashboard?

A.

They will need to alter the Query to return two separate sets of results.

B.

They will need to add two separate visualizations to the dashboard based on the same Query.

C.

They will need to create two separate dashboards.

D.

They will need to decide on a single data visualization to add to the dashboard.

E.

They will need to copy the Query and create one data visualization per query.

A data analyst has created a user-defined function using the following line of code:

CREATE FUNCTION price(spend DOUBLE, units DOUBLE)

RETURNS DOUBLE

RETURN spend / units;

Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price?

A.

SELECT PRICE customer_spend, customer_units AS customer_price FROM customer_summary

B.

SELECT price FROM customer_summary

C.

SELECT function(price(customer_spend, customer_units)) AS customer_price FROM customer_summary

D.

SELECT double(price(customer_spend, customer_units)) AS customer_price FROM customer_summary

E.

SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary

A stakeholder has provided a data analyst with a lookup dataset in the form of a 50-row CSV file. The data analyst needs to upload this dataset for use as a table in Databricks SQL.

Which approach should the data analyst use to quickly upload the file into a table for use in Databricks SOL?

A.

Create a table by uploading the file using the Create page within Databricks SQL

B.

Create a table via a connection between Databricks and the desktop facilitated by Partner Connect.

C.

Create a table by uploading the file to cloud storage and then importing the data to Databricks.

D.

Create a table by manually copying and pasting the data values into cloud storage and then importing the data to Databricks.

Consider the following two statements:

Statement 1:

Statement 2:

Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?

A.

The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL.

B.

When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

C.

There is no difference between the result sets for both statements.

D.

Both statements will fail because Databricks SQL does not support those join types.

E.

When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run.

Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs?

A.

Reduce the SQL endpoint cluster size

B.

Increase the SQL endpoint cluster size

C.

Turn off the Auto stop feature

D.

Increase the minimum scaling value

E.

Use a Serverless SQL endpoint

What describes Partner Connect in Databricks?

A.

it allows for free use of Databricks partner tools through a common API.

B.

it allows multi-directional connection between Databricks and Databricks partners easier.

C.

It exposes connection information to third-party tools via Databricks partners.

D.

It is a feature that runs Databricks partner tools on a Databricks SQL Warehouse (formerly known as a SQL endpoint).

Which of the following statements describes descriptive statistics?

A.

A branch of statistics that uses summary statistics to quantitatively describe and summarize data.

B.

A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability.

C.

A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values.

D.

A branch of statistics that uses summary statistics to categorically describe and summarize data.

E.

A branch of statistics that uses quantitative variables that must take on an uncountable set of values.

A data analyst has been asked to produce a visualization that shows the flow of users through a website.

Which of the following is used for visualizing this type of flow?

A.

Heatmap

B.

IChoropleth

C.

Word Cloud

D.

Pivot Table

E.

Sankey

How can a data analyst determine if query results were pulled from the cache?

A.

Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache.

B.

Go to the Alerts tab and check the Cache Status alert.

C.

Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache.

D.

Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache.

E.

Go to the Data tab and click Last Query. The details of the query will show if the results came from the cache.

A data organization has a team of engineers developing data pipelines following the medallion architecture using Delta Live Tables. While the data analysis team working on a project is using gold-layer tables from these pipelines, they need to perform some additional processing of these tables prior to performing their analysis.

Which of the following terms is used to describe this type of work?

A.

Data blending

B.

Last-mile

C.

Data testing

D.

Last-mile ETL

E.

Data enhancement