Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block:

transactionsDf.join(itemsDf, "itemId", how="broadcast")

A.

The syntax is wrong, how= should be removed from the code block.

B.

The join method should be replaced by the broadcast method.

C.

Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.

D.

The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.

E.

broadcast is not a valid join type.

Which of the following describes characteristics of the Spark UI?

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Which of the following describes characteristics of the Spark driver?

A.

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

B.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

C.

The Spark driver processes partitions in an optimized, distributed fashion.

D.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

E.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Which of the following is the deepest level in Spark's execution hierarchy?

A.

Job

B.

Task

C.

Executor

D.

Slot

E.

Stage

Which of the following is a characteristic of the cluster manager?

A.

Each cluster manager works on a single partition of data.

B.

The cluster manager receives input from the driver through the SparkContext.

C.

The cluster manager does not exist in standalone mode.

D.

The cluster manager transforms jobs into DAGs.

E.

In client mode, the cluster manager runs on the edge node.

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.

Code block:

transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

A.

The "outer" argument should be eliminated, since "outer" is the default join type.

B.

The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

C.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

D.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

E.

The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?

A.

spark.read.schema(fileSchema).format("parquet").load(filePath)

B.

spark.read.schema("fileSchema").format("parquet").load(filePath)

C.

spark.read().schema(fileSchema).parquet(filePath)

D.

spark.read().schema(fileSchema).format(parquet).load(filePath)

E.

spark.read.schema(fileSchema).open(filePath)

Which of the following is not a feature of Adaptive Query Execution?

A.

Replace a sort merge join with a broadcast join, where appropriate.

B.

Coalesce partitions to accelerate data processing.

C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

D.

Reroute a query in case of an executor failure.

E.

Collect runtime statistics during query execution.

Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

A.

transactionsDf.repartition(24, boost=True)

B.

transactionsDf.repartition()

C.

transactionsDf.repartition("itemId", 24)

D.

transactionsDf.coalesce(24)

E.

transactionsDf.repartition(24)

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

A.

The column names should be listed directly as arguments to the operator and not as a list.

B.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

C.

The select operator should be replaced by a drop operator.

D.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

E.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.