Databricks Certified Associate Developer…
Q: 1
In which order should the code blocks shown below be run in order to create a DataFrame that
shows the mean of column predError of DataFrame transactionsDf per column storeId and productId,
where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending
order by column storeId, leaving out any nulls in that column?
DataFrame transactionsDf:
1.
+-------------+---------+-----+-------+---------+----+
2.
|transactionId|predError|value|storeId|productId| f|
3.
+-------------+---------+-----+-------+---------+----+
4.
|
1|
3|
4|
25|
1|null|
5.
|
2|
6|
7|
2|
2|null|
6.
|
3|
3| null|
25|
3|null|
7.
|
4|
null| null|
3|
2|null|
8.
|
5|
null| null| null|
2|null|
9.
|
6|
3|
2|
25|
2|null|
10.
+-------------+---------+-----+-------+---------+----+
1. .mean("predError")
2. .groupBy("storeId")
3. .orderBy("storeId")
4. transactionsDf.filter(transactionsDf.storeId.isNotNull())
5. .pivot("productId", [2, 3])
Options
Q: 2
Which of the following describes the characteristics of accumulators?
Options
Q: 3
Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that
have missing data in at least 3 columns?
Options
Q: 4
The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively. Find the error. Code block: transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")
Options
Q: 5
The code block displayed below contains an error. When the code block below has executed, it
should have divided DataFrame transactionsDf into 14 parts, based on columns storeId and
transactionDate (in this order). Find the error.
Code block:
transactionsDf.coalesce(14, ("storeId", "transactionDate"))
Options
Q: 6
Which of the following code blocks returns only rows from DataFrame transactionsDf in which values
in column productId are unique?
Options
Q: 7
Which of the following code blocks returns about 150 randomly selected rows from the 1000-row
DataFrame transactionsDf, assuming that any row can appear more than once in the returned
DataFrame?
Options
Q: 8
Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?
Options
Q: 9
The code block displayed below contains an error. The code block should display the schema of
DataFrame transactionsDf. Find the error.
Code block:
transactionsDf.rdd.printSchema
Options
Q: 10
The code block displayed below contains an error. The code block should return all rows of
DataFrame transactionsDf, but including only columns storeId and predError. Find the error.
Code block:
spark.collect(transactionsDf.select("storeId", "predError"))
Options
Question 1 of 10