Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Which of the following code blocks returns a DataFrame that matches the multi-column DataFrame itemsDf, except that integer column itemId has been converted into a string column?

A.

itemsDf.withColumn("itemId", convert("itemId", "string"))

B.

itemsDf.withColumn("itemId", col("itemId").cast("string"))

(Correct)

C.

itemsDf.select(cast("itemId", "string"))

D.

itemsDf.withColumn("itemId", col("itemId").convert("string"))

E.

spark.cast(itemsDf, "itemId", "string")

Which of the following statements about the differences between actions and transformations is correct?

A.

Actions are evaluated lazily, while transformations are not evaluated lazily.

B.

Actions generate RDDs, while transformations do not.

C.

Actions do not send results to the driver, while transformations do.

D.

Actions can be queued for delayed execution, while transformations can only be processed immediately.

E.

Actions can trigger Adaptive Query Execution, while transformation cannot.

Which of the following is the idea behind dynamic partition pruning in Spark?

A.

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

B.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

C.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

D.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

E.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Which of the following code blocks generally causes a great amount of network traffic?

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns transactionId, storeId, productId and f?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.drop(col("value"), col("predError"))

B.

transactionsDf.drop("predError", "value")

C.

transactionsDf.drop(value, predError)

D.

transactionsDf.drop(["predError", "value"])

E.

transactionsDf.drop([col("predError"), col("value")])

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

A.

itemsDf.write.mode("overwrite").parquet(filePath)

B.

itemsDf.write.option("parquet").mode("overwrite").path(filePath)

C.

itemsDf.write(filePath, mode="overwrite")

D.

itemsDf.write.mode("overwrite").path(filePath)

E.

itemsDf.write().parquet(filePath, mode="overwrite")

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

A.

1.counter = 0

2.

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

6.

7.print(counter)

B.

1.counter = 0

2.

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

6.

7.itemsDf.foreach(count)

8.print(counter)

C.

print(itemsDf.foreach(lambda x: 'Inc.' in x))

D.

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

E.

1.accum=sc.accumulator(0)

2.

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

6.

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Which of the following code blocks returns a single row from DataFrame transactionsDf?

Full DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.where(col("storeId").between(3,25))

B.

transactionsDf.filter((col("storeId")!=25) | (col("productId")==2))

C.

transactionsDf.filter(col("storeId")==25).select("predError","storeId").distinct()

D.

transactionsDf.select("productId", "storeId").where("storeId == 2 OR storeId != 25")

E.

transactionsDf.where(col("value").isNull()).select("productId", "storeId").distinct()

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])

B.

transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")

C.

transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")

D.

transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value")

(Correct)

E.

1.transactionsDf.createOrReplaceTempView("transactionsDf")

2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

F.

transactionsDf.filter(col(transactionId).isin([3,4,6]))

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier

whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__)

A.

1. filter

2. col("supplier").isin("Sports")

3. "itemName"

4. explode(col("attributes"))

B.

1. where

2. col("supplier").contains("Sports")

3. "itemName"

4. "attributes"

C.

1. where

2. col(supplier).contains("Sports")

3. explode(attributes)

4. itemName

D.

1. where

2. "Sports".isin(col("Supplier"))

3. "itemName"

4. array_explode("attributes")

E.

1. filter

2. col("supplier").contains("Sports")

3. "itemName"

4. explode("attributes")