Question 5

Question

In order for Structured Streaming to reliably track the exact progress of the processing so that it can
handle any kind of failure by restarting and/or reprocessing, which of the following two approaches
is used by Spark to record the offset range of the data being processed in each trigger?

Accepted Answer

Checkpointing and Write-ahead Logs

Cameron O. · Answer

Probably A. From what I remember, Spark records offset ranges using both checkpointing and write-ahead logs to make recovery solid after a failure. Idempotent sinks matter for output guarantees, but they're not used for tracking progress itself I think. Anyone see otherwise?

Drew · Answer

A is what I've seen in the docs too. Spark uses both checkpointing and write-ahead logs to track offsets for fault tolerance so it can recover from failures. Pretty sure that's the best fit here, unless something changed recently.

Layla Y. · Answer

C vs E? Both mention idempotent sinks but only replayable sources are needed for full recovery, right? Not 100 percent sure here.

Mia · Answer

Not C, A

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE