DATABRICKS CERTIFIED DATA ENGINEER ASSOC…
Q: 1
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The code block used by the data engineer is below:
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?Options
Q: 2
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The cade block used by the data engineer is below:
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?Options
Q: 3
Which of the following code blocks will remove the rows where the value in column age is greater
than 25 from the existing Delta table my_table and save the updated table?
Options
Q: 4
A data engineer needs to apply custom logic to identify employees with more than 5 years of
experience in array column employees in table stores. The custom logic should create a new column
exp_employees that is an array of all of the employees with more than 5 years of experience for each
row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-
order function.
Which of the following code blocks successfully completes this task?


Options
Q: 5
In order for Structured Streaming to reliably track the exact progress of the processing so that it can
handle any kind of failure by restarting and/or reprocessing, which of the following two approaches
is used by Spark to record the offset range of the data being processed in each trigger?
Options
Q: 6
A data analyst has a series of queries in a SQL program. The data analyst wants this program to run
every day. They only want the final query in the program to run on Sundays. They ask for help from
the data engineering team to complete this task.
Which of the following approaches could be used by the data engineering team to complete this
task?
Options
Q: 7
A data organization leader is upset about the data analysis team’s reports being different from the
data engineering team’s reports. The leader believes the siloed nature of their organization’s data
engineering and data analysis architectures is to blame.
Which of the following describes how a data lakehouse could alleviate this issue?
Options
Q: 8
A data engineer and data analyst are working together on a data pipeline. The data engineer is
working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is
working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming
input. They now want to migrate their pipeline to use Delta Live Tables.
Which of the following changes will need to be made to the pipeline when migrating to Delta Live
Tables?
Options
Q: 9
A new data engineering team team has been assigned to an ELT project. The new data engineering
team will need full privileges on the table sales to fully manage the project.
Which of the following commands can be used to grant full permissions on the database to the new
data engineering team?
Options
Q: 10
Which query is performing a streaming hop from raw data to a Bronze table?
A)
B)
C)
D)

B)
C)
D)

Options
Question 1 of 10