A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv. They run the following command: https://kxbjsyuhceggsyvxdkof.supabase.co/storage/v1/object/public/file-images/DATABRICKS-CERTIFIED-DATA-ENGINEER-ASSOCIATE/page_59_img_1.jpg Which of the following lines of code fills in the above blank to successfully complete the task?

None of these lines of code are needed to successfully complete the task

View Databricks-Certified-Data-Engineer-Associate Exam Questions

Q: 11

A dataset has been defined using Delta Live Tables and includes an expectations clause: CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options

Correct Answer:

Explanation

The expected behavior when a batch of data containing data that violates the expectation is

processed is that the job will fail. This is because the expectation clause has the ON VIOLATION FAIL

UPDATE option, which means that if any record in the batch does not meet the expectation, the

entire batch will be rejected and the job will fail. This option is useful for enforcing strict data quality

rules and preventing invalid data from entering the target dataset.

Option A is not correct, as the ON VIOLATION FAIL UPDATE option does not drop the records that

violate the expectation, but fails the entire batch. To drop the records that violate the expectation

and record them as invalid in the event log, the ON VIOLATION DROP RECORD option should be used.

Option C is not correct, as the ON VIOLATION FAIL UPDATE option does not drop the records that

violate the expectation, but fails the entire batch. To drop the records that violate the expectation

and load them into a quarantine table, the ON VIOLATION QUARANTINE RECORD option should be

used.

Option D is not correct, as the ON VIOLATION FAIL UPDATE option does not add the records that

violate the expectation, but fails the entire batch. To add the records that violate the expectation and

record them as invalid in the event log, the ON VIOLATION LOG RECORD option should be used.

Option E is not correct, as the ON VIOLATION FAIL UPDATE option does not add the records that

violate the expectation, but fails the entire batch. To add the records that violate the expectation and

flag them as invalid in a field added to the target dataset, the ON VIOLATION FLAG RECORD option

should be used.

Reference:

Delta Live Tables Expectations

[Databricks Data Engineer Professional Exam Guide]

Q: 12

Which file format is used for storing Delta Lake Table?

Options

Q: 13

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

Options

Q: 14

Identify how the count_if function and the count where x is null can be used Consider a table random_values with below data. What would be the output of below query? select count_if(col > 1) as count_ a. count(*) as count_b.count(col1) as count_c from random_values col1 0 1 2 NULL - 2 3

Options

Discussion

SkepticalTester7883 Jan 26, 2026 4:04 am

C/D? I feel like it's one of those, mainly because count_if can be tricky if the example data includes NULLs. My guess is D since usually count(*) covers all and count(col1) skips NULLs, but not totally sure if there are three or four hits for col > 1. If I'm missing a detail in the sample table let me know.

Parker Feb 2, 2026 5:48 am

D imo

Piya Q. Jan 23, 2026 8:03 pm

Its A. The trap here is thinking count_if counts NULLs, but it doesn't-it only counts where the condition is true. Also, count(*) includes nulls, while count(col1) skips those. So if there are three values >1, six total rows, and five non-null col1 entries, A matches up. Pretty sure this lines up based on Databricks docs but happy to hear other takes.

Casey Feb 2, 2026 2:12 am

B or D. I thought count_if(col > 1) would give 4 because I assumed there are four values greater than 1 in the example data, not three. The count(*) and count(col1) totals make sense for 6 and 5, but that first number keeps tripping me up. Maybe I'm missing a NULL trap in the sample? Let me know if anyone reads it differently.

Nina Jan 30, 2026 5:58 am

Is this in the official practice set or only in the exam guide? Want to check details.

Chloe N. Jan 23, 2026 12:19 pm

Option D for me, since count_if(col > 1) should count rows where the value is above 1, and I figured that plus standard counts would match up. It looks like 4 for the first one fits if there are four values greater than 1. Not totally sure on the rest but D seems close.

Be respectful. No spam.

Q: 15

Which of the following is stored in the Databricks customer's cloud account?

Options

Correct Answer:

Explanation

The only option that is stored in the Databricks customer’s cloud account is data. Data is stored in the

customer’s cloud storage service, such as AWS S3 or Azure Data Lake Storage. The customer has full

control and ownership of their data and can access it directly from their cloud account.

Option A is not correct, as the Databricks web application is hosted and managed by Databricks on

their own cloud infrastructure. The customer does not need to install or maintain the web

application, but only needs to access it through a web browser.

Option B is not correct, as the cluster management metadata is stored and managed by Databricks

on their own cloud infrastructure. The cluster management metadata includes information such as

cluster configuration, status, logs, and metrics. The customer can view and manage their clusters

through the Databricks web application, but does not have direct access to the cluster management

metadata.

Option C is not correct, as the repos are stored and managed by Databricks on their own cloud

infrastructure. Repos are version-controlled repositories that store code and data files for Databricks

projects. The customer can create and manage their repos through the Databricks web application,

but does not have direct access to the repos.

Option E is not correct, as the notebooks are stored and managed by Databricks on their own cloud

infrastructure. Notebooks are interactive documents that contain code, text, and visualizations for

Databricks workflows. The customer can create and manage their notebooks through the Databricks

web application, but does not have direct access to the notebooks.

Reference:

Databricks Architecture

Databricks Data Sources

Databricks Repos

[Databricks Notebooks]

[Databricks Data Engineer Professional Exam Guide]

Q: 16

Which of the following Git operations must be performed outside of Databricks Repos?

Options

Q: 17

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos. Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

Options

Q: 18

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options

Q: 19

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv. They run the following command:

DATABRICKS CERTIFIED DATA ENGINEER ASSOCIATE question

Which of the following lines of code fills in the above blank to successfully complete the task?

Options

Q: 20

Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?

Options

Question 11 of 20 · Page 2 / 2

Premium Access Includes

✓ Quiz Simulator
✓ Exam Mode
✓ Progress Tracking
✓ Question Saving
✓ Flash Cards
✓ Drag & Drops
✓ 3 Months Access
✓ PDF Downloads

Get Premium Access

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE