Q: 1
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The code block used by the data engineer is below:
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to process all of the available data in as many batches as
required, which of the following lines of code should the data engineer use to fill in the blank?Options
Discussion
Makes sense to pick B here. trigger(availableNow=True) is meant for processing all available data in multiple batches if needed.
Ugh, Databricks changing syntax again. Option B
I'd actually pick D here. In my experience, trigger(processingTime="once") will process all the currently available data in one go and then stop, which feels like what they're asking for. Could be wrong if they're expecting multiple batches though. Anyone thinking the same?
D
B is right here since
trigger(availableNow=True) makes the job process all existing data in as many batches as needed, which matches the requirement. Official docs and practice tests both highlight this option. Pretty sure, but let me know if you see it differently.Its B, not D-processingTime="once" does just one batch, but availableNow gets all data in necessary batches.
D or E, leaning D since the processingTime looks like a trap option here.
Not D, B. B is right since trigger(availableNow=True) processes all available data in as many batches as needed.
Be respectful. No spam.