Q: 2
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The cade block used by the data engineer is below:
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds,
which of the following lines of code should the data engineer use to fill in the blank?Options
Discussion
D. Some folks might get tricked by E but that's for continuous, not actual micro-batching. Only D uses processingTime and matches Databricks docs. If anyone thinks otherwise, open to hearing it.
Its E
D imo
So for micro-batch processing specifically, only D actually sets up a 5-second micro-batch interval. The other options don't use the right trigger syntax or use continuous mode instead. Pretty sure about this since it's straight from Databricks doc.
Why do they always sneak in weird syntax options? D is the right one here, since processingTime="5 seconds" sets the micro-batch trigger just like in the docs. A little torn when I first saw E but that's just for continuous, not micro-batch.
C vs D? Both have time specs but I think only D (
processingTime="5 seconds") directly controls micro-batch intervals in Structured Streaming. C looks like invalid syntax, and E is for continuous mode, not micro-batching. Not 100% sure though, anyone disagree?Makes sense to use D here. Structured Streaming's
processingTime is what triggers micro-batches every 5 seconds. E would be for continuous mode, which isn't micro-batching, so pretty sure D fits the requirement best.Option E since continuous trigger should also process at intervals, right? Not sure if that counts as micro-batch though.
Be respectful. No spam.