Free Practice Test

Free Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Practice Questions – 2025 Updated

Databricks DATABRICKS CERTIFIED ASSOCIAT…

View Mode
Q: 1
A Spark application is experiencing performance issues in client mode because the driver is resource- constrained. How should this issue be resolved?
Options
Q: 2
How can a Spark developer ensure optimal resource utilization when running Spark jobs in Local Mode for testing? Options:
Options
Q: 3
An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline. The initial code is: Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question def in_spanish_inner(df: pd.Series) -> pd.Series: model = get_translation_model(target_lang='es') return df.apply(model) in_spanish = sf.pandas_udf(in_spanish_inner, StringType()) How can the MLOps engineer change this code to reduce how many times the language model is loaded?
Options
Q: 4
A developer wants to test Spark Connect with an existing Spark application. What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)
Options
Q: 5
A data scientist has identified that some records in the user profile table contain null values in any of the fields, and such records should be removed from the dataset before processing. The schema includes fields like user_id, username, date_of_birth, created_ts, etc. The schema of the user profile table looks like this: Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question Which block of Spark code can be used to achieve this requirement? Options:
Options
Q: 6
16 of 55. A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately. Which two characteristics of Apache Spark's execution model explain this behavior? (Choose 2 answers)
Options
Q: 7
5 of 55. What is the relationship between jobs, stages, and tasks during execution in Apache Spark?
Options
Q: 8
Which command overwrites an existing JSON file when writing a DataFrame?
Options
Q: 9
A data engineer is working on a real-time analytics pipeline using Apache Spark Structured Streaming. The engineer wants to process incoming data and ensure that triggers control when the query is executed. The system needs to process data in micro-batches with a fixed interval of 5 seconds. Which code snippet the data engineer could use to fulfil this requirement? A) Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question B) Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question C) Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question D) Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question Options:
Options
Q: 10
Given the code: Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question df = spark.read.csv("large_dataset.csv") filtered_df = df.filter(col("error_column").contains("error")) mapped_df = filtered_df.select(split(col("timestamp"), " ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?
Options
Q: 11
17 of 55. A data engineer has noticed that upgrading the Spark version in their applications from Spark 3.0 to Spark 3.5 has improved the runtime of some scheduled Spark applications. Looking further, the data engineer realizes that Adaptive Query Execution (AQE) is now enabled. Which operation should AQE be implementing to automatically improve the Spark application performance?
Options
Q: 12
11 of 55. Which Spark configuration controls the number of tasks that can run in parallel on an executor?
Options
Q: 13
Given the code fragment: Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question import pyspark.pandas as ps psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?
Options
Q: 14
What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?
Options
Q: 15
A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set for spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold. Which type of join will Adaptive Query Execution (AQE) choose in this case?
Options
Q: 16
What is the benefit of using Pandas on Spark for data transformations? Options:
Options
Q: 17
A data engineer wants to write a Spark job that creates a new managed table. If the table already exists, the job should fail and not modify anything. Which save mode and method should be used?
Options
Q: 18
10 of 55. What is the benefit of using Pandas API on Spark for data transformations?
Options
Q: 19
Given the schema: Databricks DATABRICKS CERTIFIED ASSOCIATE DEVELOPER FOR APACHE SPARK 3 question event_ts TIMESTAMP, sensor_id STRING, metric_value LONG, ingest_ts TIMESTAMP, source_file_path STRING The goal is to deduplicate based on: event_ts, sensor_id, and metric_value. Options:
Options
Q: 20
Which UDF implementation calculates the length of strings in a Spark DataFrame?
Options
Q: 21
1 of 55. A data scientist wants to ingest a directory full of plain text files so that each record in the output DataFrame contains the entire contents of a single file and the full path of the file the text was read from. The first attempt does read the text files, but each record contains a single line. This code is shown below: txt_path = "/datasets/raw_txt/*" df = spark.read.text(txt_path) # one row per line by default df = df.withColumn("file_path", input_file_name()) # add full path Which code change can be implemented in a DataFrame that meets the data scientist's requirements?
Options
Q: 22
6 of 55. Which components of Apache Spark’s Architecture are responsible for carrying out tasks when assigned to them?
Options
Q: 23
22 of 55. A Spark application needs to read multiple Parquet files from a directory where the files have differing but compatible schemas. The data engineer wants to create a DataFrame that includes all columns from all files. Which code should the data engineer use to read the Parquet files and include all columns using Apache Spark?
Options
Q: 24
An engineer has two DataFrames: df1 (small) and df2 (large). A broadcast join is used: python CopyEdit from pyspark.sql.functions import broadcast result = df2.join(broadcast(df1), on='id', how='inner') What is the purpose of using broadcast() in this scenario? Options:
Options
Q: 25
What is the benefit of Adaptive Query Execution (AQE)?
Options
Question 1 of 25
Scroll to Top