Q: 5
In order for Structured Streaming to reliably track the exact progress of the processing so that it can
handle any kind of failure by restarting and/or reprocessing, which of the following two approaches
is used by Spark to record the offset range of the data being processed in each trigger?
Options
Discussion
Option A, I don’t think it’s C-checkpointing and WALs are the offset tracking pieces, idempotent sinks trip folks up here.
Yeah that's A. Checkpointing and write-ahead logs are the Spark way to keep track of offsets every trigger.
A (if I had to pick) for sure. Had something like this in a mock, Spark uses checkpointing plus write-ahead logs to track streaming state and offsets for restarts. Idempotent sinks aren't about offset tracking directly. Pretty confident but open to correction if specs changed.
D had something like this in a mock and it matched.
A , Spark uses checkpointing and write-ahead logs to make sure streaming jobs can pick up the right offset after failure. Idempotent sinks aren't for tracking, more about output safety. If I'm mixing things up let me know.
Yeah solid, checkpointing plus WAL is how Spark keeps offset tracking resilient. A
Its A, checkpointing plus write-ahead logs let Spark reliably pick up from the last seen offset if a failure happens. Idempotent sinks are mostly for output reliability, not offset tracking. I think this is spot on for how SS works, but correct me if I'm off.
Probably A. From what I remember, Spark records offset ranges using both checkpointing and write-ahead logs to make recovery solid after a failure. Idempotent sinks matter for output guarantees, but they're not used for tracking progress itself I think. Anyone see otherwise?
A is what I've seen in the docs too. Spark uses both checkpointing and write-ahead logs to track offsets for fault tolerance so it can recover from failures. Pretty sure that's the best fit here, unless something changed recently.
C vs E? Both mention idempotent sinks but only replayable sources are needed for full recovery, right? Not 100 percent sure here.
Be respectful. No spam.