Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Total 180 questions
Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?
The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before
2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1.root
2. |-- itemId: integer (nullable = true)
3. |-- attributes: array (nullable = true)
4. | |-- element: string (containsNull = true)
5. |-- supplier: string (nullable = true)
Code block:
1.schema = StructType([
2. StructType("itemId", IntegerType(), True),
3. StructType("attributes", ArrayType(StringType(), True), True),
4. StructType("supplier", StringType(), True)
5.])
6.
7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)
Which of the following code blocks returns a DataFrame with a single column in which all items in column attributes of DataFrame itemsDf are listed that contain the letter i?
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+
Which of the following code blocks concatenates rows of DataFrames transactionsDf and transactionsNewDf, omitting any duplicates?
The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of
the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))
The code block shown below should return a copy of DataFrame transactionsDf with an added column cos. This column should have the values in column value converted to degrees and having
the cosine of those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))
The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')
In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from
most to least often?
Sample of DataFrame articlesDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+
Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?
The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching
column names and inserting null values where column names do not appear in both DataFrames. Find the error.
Sample of DataFrame transactionsDfMonday:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 5| null| null| null| 2|null|
5.| 6| 3| 2| 25| 2|null|
6.+-------------+---------+-----+-------+---------+----+
Sample of DataFrame transactionsDfTuesday:
1.+-------+-------------+---------+-----+
2.|storeId|transactionId|productId|value|
3.+-------+-------------+---------+-----+
4.| 25| 1| 1| 4|
5.| 2| 2| 2| 7|
6.| 3| 4| 2| null|
7.| null| 5| 2| null|
8.+-------+-------------+---------+-----+
Code block:
sc.union([transactionsDfMonday, transactionsDfTuesday])