Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ecus65

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

A.

Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions

B.

Decrease values for the properties spark.default.parallelism and spark.sql.partitions

C.

Increase values for the properties spark.sql.parallelism and spark.sql.partitions

D.

Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions

E.

Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn("storeNumber", "storeId")

A.

Instead of withColumn, the withColumnRenamed method should be used.

B.

Arguments "storeNumber" and "storeId" each need to be wrapped in a col() operator.

C.

Argument "storeId" should be the first and argument "storeNumber" should be the second argument to the withColumn method.

D.

The withColumn operator should be replaced with the copyDataFrame operator.

E.

Instead of withColumn, the withColumnRenamed method should be used and argument "storeId" should be the first and argument "storeNumber" should be the second argument to that method.

Which of the following describes the role of tasks in the Spark execution hierarchy?

A.

Tasks are the smallest element in the execution hierarchy.

B.

Within one task, the slots are the unit of work done for each partition of the data.

C.

Tasks are the second-smallest element in the execution hierarchy.

D.

Stages with narrow dependencies can be grouped into one task.

E.

Tasks with wide dependencies can be grouped into one stage.

The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block

to accomplish this.

Example of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+------------------------------------------+

2.|itemId|itemName |supplier |itemNameElements |

3.+------+----------------------------------+-------------------+------------------------------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]|

5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] |

6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] |

7.+------+----------------------------------+-------------------+------------------------------------------+

Code block:

itemsDf.__1__(__2__(__3__)__4__)

A.

1. select

2. count

3. col("itemNameElements")

4. >3

B.

1. filter

2. count

3. itemNameElements

4. >=3

C.

1. select

2. count

3. "itemNameElements"

4. >3

D.

1. filter

2. size

3. "itemNameElements"

4. >=3

(Correct)

E.

1. select

2. size

3. "itemNameElements"

4. >3

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

A.

transactionsDf["storeId"].distinct()

B.

transactionsDf.select("storeId").distinct()

(Correct)

C.

transactionsDf.filter("storeId").distinct()

D.

transactionsDf.select(col("storeId").distinct())

E.

transactionsDf.distinct("storeId")

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

A.

The DataFrameReader is not accessed correctly.

B.

The transaction is evaluated lazily, so no file will be read.

C.

Spark is unable to understand the file type.

D.

The code block is unable to capture all columns.

E.

The resulting DataFrame will not have the appropriate schema.

Which of the following are valid execution modes?

A.

Kubernetes, Local, Client

B.

Client, Cluster, Local

C.

Server, Standalone, Client

D.

Cluster, Server, Local

E.

Standalone, Client, Cluster

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

A.

transactionsDf.withColumn("storeId", convert("storeId", "string"))

B.

transactionsDf.withColumn("storeId", col("storeId", "string"))

C.

transactionsDf.withColumn("storeId", col("storeId").convert("string"))

D.

transactionsDf.withColumn("storeId", col("storeId").cast("string"))

E.

transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Which of the following describes slots?

A.

Slots are dynamically created and destroyed in accordance with an executor's workload.

B.

To optimize I/O performance, Spark stores data on disk in multiple slots.

C.

A Java Virtual Machine (JVM) working as an executor can be considered as a pool of slots for task execution.

D.

A slot is always limited to a single core.

Slots are the communication interface for executors and are used for receiving commands and sending results to the driver.

Which of the following describes Spark's way of managing memory?

A.

Spark uses a subset of the reserved system memory.

B.

Storage memory is used for caching partitions derived from DataFrames.

C.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.