Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

The code block displayed below contains an error. When the code block below has executed, it should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and

transactionDate (in this order). Find the error.

Code block:

transactionsDf.coalesce(14, ("storeId", "transactionDate"))

A.

The parentheses around the column names need to be removed and .select() needs to be appended to the code block.

B.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .count() needs to be appended to the code block.

(Correct)

C.

Operator coalesce needs to be replaced by repartition, the parentheses around the column names need to be removed, and .select() needs to be appended to the code block.

D.

Operator coalesce needs to be replaced by repartition and the parentheses around the column names need to be replaced by square brackets.

E.

Operator coalesce needs to be replaced by repartition.

The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame

itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

__1__.__2__(__3__, __4__, __5__)

A.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. transactionsDf.transactionId==itemsDf.transactionId

5. "outer"

B.

1. transactionsDf

2. join

3. itemsDf

4. transactionsDf.transactionId==itemsDf.transactionId

5. "anti"

C.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. "transactionId"

5. "left_semi"

D.

1. itemsDf

2. broadcast

3. transactionsDf

4. "transactionId"

5. "left_semi"

E.

1. itemsDf

2. join

3. broadcast(transactionsDf)

4. "transactionId"

5. "left_semi"

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

A.

Since itemId is the index, it does not need to be an argument to the select() method.

B.

The alias() method needs to be called after the select() method.

C.

The explode() method expects a Column object rather than a string.

D.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

E.

The split() method should be used inside the select() method instead of the explode() method.

Which of the following statements about garbage collection in Spark is incorrect?

A.

Garbage collection information can be accessed in the Spark UI's stage detail view.

B.

Optimizing garbage collection performance in Spark may limit caching ability.

C.

Manually persisting RDDs in Spark prevents them from being garbage collected.

D.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

E.

Serialized caching is a strategy to increase the performance of garbage collection.