DATABRICKS CERTIFIED DATA ENGINEER ASSOC…
Q: 1
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The code block used by the data engineer is below:
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?Options
Q: 2
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The cade block used by the data engineer is below:
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?Options
Q: 3
Which of the following code blocks will remove the rows where the value in column age is greater
than 25 from the existing Delta table my_table and save the updated table?
Options
Q: 4
A data engineer needs to apply custom logic to identify employees with more than 5 years of
experience in array column employees in table stores. The custom logic should create a new column
exp_employees that is an array of all of the employees with more than 5 years of experience for each
row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-
order function.
Which of the following code blocks successfully completes this task?


Options
Q: 5
In order for Structured Streaming to reliably track the exact progress of the processing so that it can
handle any kind of failure by restarting and/or reprocessing, which of the following two approaches
is used by Spark to record the offset range of the data being processed in each trigger?
Options
Q: 6
A data analyst has a series of queries in a SQL program. The data analyst wants this program to run
every day. They only want the final query in the program to run on Sundays. They ask for help from
the data engineering team to complete this task.
Which of the following approaches could be used by the data engineering team to complete this
task?
Options
Q: 7
A data organization leader is upset about the data analysis team’s reports being different from the
data engineering team’s reports. The leader believes the siloed nature of their organization’s data
engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?
Options
Q: 8
A data engineer and data analyst are working together on a data pipeline. The data engineer is
working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is
working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming
input. They now want to migrate their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live
Tables?
Options
Q: 9
A new data engineering team team has been assigned to an ELT project. The new data engineering
team will need full privileges on the table sales to fully manage the project.
Which of the following commands can be used to grant full permissions on the database to the new
data engineering team?
Options
Q: 10
Which query is performing a streaming hop from raw data to a Bronze table?
A)
B)
C)
D)

B)
C)
D)

Options
Q: 11
A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE
What is the expected behavior when a batch of data containing data that violates these constraints is
processed?
Options
Q: 12
Which file format is used for storing Delta Lake Table?
Options
Q: 13
Which of the following describes a scenario in which a data team will want to utilize cluster pools?
Options
Q: 14
Identify how the count_if function and the count where x is null can be used
Consider a table random_values with below data.
What would be the output of below query?
select count_if(col > 1) as count_
a. count(*) as count_b.count(col1) as count_c from random_values col1
0
1
2
NULL -
2
3
Options
Q: 15
Which of the following is stored in the Databricks customer's cloud account?
Options
Q: 16
Which of the following Git operations must be performed outside of Databricks Repos?
Options
Q: 17
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or
version their project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks
versioning?
Options
Q: 18
Which of the following commands can be used to write data into a Delta table while avoiding the
writing of duplicate records?
Options
Q: 19
A data engineer needs to create a table in Databricks using data from a CSV file at location
/path/to/csv.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
Which of the following lines of code fills in the above blank to successfully complete the task?Options
Q: 20
Which of the following statements regarding the relationship between Silver tables and Bronze
tables is always true?
Options
Q: 21
Which of the following tools is used by Auto Loader process data incrementally?
Options
Q: 22
A data engineer has realized that the data files associated with a Delta table are incredibly small.
They want to compact the small files to form larger files to improve performance.
Which of the following keywords can be used to compact the small files?
Options
Q: 23
A data engineer that is new to using Python needs to create a Python function to add two integers
together and return the sum?
Which of the following code blocks can the data engineer use to complete this task?
A)
B)
C)
D)
E)

B)
C)
D)
E)

Options
Q: 24
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a
specific task within a cell. They still want all of the other cells to use Python without making any
changes to those cells.
Which of the following describes how the data engineer can use SQL within a cell of their Python
notebook?
Options
Q: 25
A data engineer has a Python variable table_name that they would like to use in a SQL query. They
want to construct a Python code block that will run the query using table_name.
They have the following incomplete code block:
____(f"SELECT customer_id, spend FROM {table_name}")
Which of the following can be used to fill in the blank to successfully complete the task?
Options
Question 1 of 25