Q: 20
You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to
aggregate events across hourly intervals before loading the results to BigQuery for analysis. Your
solution must be scalable so it can process and load large volumes of events to BigQuery. What
should you do?
Options
Discussion
I think C could work since Cloud Functions can run on a schedule and process Pub/Sub messages, but not sure how well it scales with high event loads. Not as robust as Dataflow maybe, but still viable for light pipelines. Anyone see issues?
A imo. Streaming Dataflow with tumbling windows is made for real-time scalable aggregation like this. B is tempting but loses out on near-real-time and scalability, especially with huge event spikes. Open to other takes though.
Streaming aggregation is the scalable way here, so A.
Be respectful. No spam.