Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access Download Demo

Page: 5 / 6
Total 180 questions

The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block:

transactionsDf.join(itemsDf, "itemId", how="broadcast")

The syntax is wrong, how= should be removed from the code block.

The join method should be replaced by the broadcast method.

Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.

The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.

broadcast is not a valid join type.

Question # 42

Which of the following describes characteristics of the Spark UI?

Via the Spark UI, workloads can be manually distributed across executors.

Via the Spark UI, stage execution speed can be modified.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

There is a place in the Spark UI that shows the property spark.executor.memory.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Question # 43

Which of the following describes characteristics of the Spark driver?

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

The Spark driver processes partitions in an optimized, distributed fashion.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Question # 44

Which of the following is the deepest level in Spark's execution hierarchy?

Job

Task

Executor

Slot

Stage

Question # 45

Which of the following is a characteristic of the cluster manager?

Each cluster manager works on a single partition of data.

The cluster manager receives input from the driver through the SparkContext.

The cluster manager does not exist in standalone mode.

The cluster manager transforms jobs into DAGs.

In client mode, the cluster manager runs on the edge node.

Question # 46

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.

Code block:

transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

The "outer" argument should be eliminated, since "outer" is the default join type.

The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Question # 47

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

spark.read.schema(fileSchema).format("parquet").load(filePath)

spark.read.schema("fileSchema").format("parquet").load(filePath)

spark.read().schema(fileSchema).parquet(filePath)

spark.read().schema(fileSchema).format(parquet).load(filePath)

spark.read.schema(fileSchema).open(filePath)

Question # 48

Which of the following is not a feature of Adaptive Query Execution?

Replace a sort merge join with a broadcast join, where appropriate.

Coalesce partitions to accelerate data processing.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

Reroute a query in case of an executor failure.

Collect runtime statistics during query execution.

Question # 49

Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

transactionsDf.repartition(24, boost=True)

transactionsDf.repartition()

transactionsDf.repartition("itemId", 24)

transactionsDf.coalesce(24)

transactionsDf.repartition(24)

Question # 50

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

The column names should be listed directly as arguments to the operator and not as a list.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

The select operator should be replaced by a drop operator.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Explanation:

Explanation

Correct code block: transactionsDf.drop("productId", "f")

This QUESTION NO: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as

strings without being wrapped in a col() operator.

Correct! Here, you need to figure out the many, many things that are wrong with the initial code block. While the QUESTION NO: can be solved by using a select statement, a drop statement, given

the

answer options, is the correct one. Then, you can read in the documentation that drop does not take a list as an argument, but just the column names that should be dropped. Finally, the column

names should be expressed as strings and not as Python variable names as in the original code block.

The column names should be listed directly as arguments to the operator and not as a list.

Incorrect. While this is a good first step and part of the correct solution (see above), this modification is insufficient to solve the question.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f

should be replaced by transactionId, predError, value and storeId.

Wrong. If you use the same pattern as in the original code block (col(productId), col(f)), you are still making a mistake. col(productId) will trigger Python to search for the content of a variable named

productId instead of telling Spark to use the column productId - for that, you need to express it as a string.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

No. This still leaves you with Python trying to interpret the column names as Python variables (see above).

The select operator should be replaced by a drop operator.

Wrong, this is not enough to solve the question. If you do this, you will still face problems since you are passing a Python list to drop and the column names are still interpreted as Python variables

(see above).

More info: pyspark.sql.DataFrame.drop â€” PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 30 (Databricks import instructions)

Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: