Q: 8
A company has implemented a data ingestion pipeline for sales transactions from its ecommerce
website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service.
The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model
generates real-time sales forecasts based on the data and presents the data in an OpenSearch
dashboard.
The company needs to optimize the data ingestion pipeline to support sub-second latency for the
real-time dashboard.
Which change to the architecture will meet these requirements?
Options
Discussion
Option A matches what I had in a mock. Setting Firehose buffer interval to zero is the only way here to cut batch delay, so you get almost instant updates in OpenSearch. None of the other options really get true sub-second unless you change the buffer itself. Pretty sure it's A, but lmk if you see it differently.
Option A looks right. If you want sub-second latency, you can't wait for the 60 second buffer-Firehose needs to send records as they come in. Setting buffering to zero pushes data through immediately. Tuning PutRecordBatch helps with efficiency but the main thing is removing that delay. Pretty sure that's what AWS recommends for real-time use cases like this. Anyone see a downside?
Option A seen this logic in a few official practice questions and the AWS docs too.
Honestly AWS makes this so annoying, always buffer tweak questions. A
B tbh, DataSync with enhanced fan-out sounds like faster parallel processing to me. Option A feels like a buffer config trap.
Hmm, I don’t think A is the only way here. D could also help reduce latency since SQS decouples and can get close to real-time with the right consumers. Trap is thinking only buffer settings matter.
Zero buffering is what gets sub-second latency, so A.
Why not just tweak the Firehose buffer to zero instead of swapping out the whole service? Feels like B is a trap.
A yeah, zero buffering on Firehose is needed to get rid of that 60s delay. B and D both add more moving parts or aren't real-time enough. Not 100% since AWS likes their curveballs but from what I know A matches the sub-second ask best. Disagree?
Probably A is the way to go since setting Firehose buffering to zero removes batch delay, which is key for sub-second latency. B is tempting but DataSync isn’t real-time for streaming events. Pretty sure about A, unless I missed something critical.
Be respectful. No spam.