Q: 8
You are designing a data processing pipeline. The pipeline must be able to scale automatically as load
increases. Messages must be processed at least once, and must be ordered within windows of 1
hour. How should you design the solution?
Options
Discussion
Option D is the way to go. Pub/Sub plus Dataflow are both cloud-native, can autoscale automatically, and Dataflow supports windowed ordering for that 1-hour requirement. B is tempting if you like Kafka, but it isn't as integrated or fully managed in GCP as Pub/Sub. Anyone see a use case where B would actually be better?
I don't think it's B. D is more cloud-native and actually autoscaling, plus Dataflow gives that windowed ordering you need.
Its D since Pub/Sub and Dataflow together are cloud-native, fully managed, and both autoscale with load. Dataflow specifically gives you windowing for that 1 hour ordering which is what the question asks. Kafka options don’t fit as seamlessly on GCP for this use case. Pretty sure about this but open to pushback if I missed some requirement.
Cloud Pub/Sub plus Dataflow (D) is serverless and autoscaling, which fits the scalable pipeline need. Dataflow handles windowed ordering for that 1-hour window. Pretty sure that's what they want here but happy if someone has another angle.
D , Pub/Sub and Dataflow are fully managed and actually autoscale on demand. Also Dataflow supports windowed ordering so you can order messages per hour just like the question needs. Not 100% sure if there's any gotcha but this combo is pretty standard for GCP. Let me know if I missed anything.
Probably D since Pub/Sub with Dataflow is the only fully managed combo here that autosclaes and handles windowed ordering properly.
D imo. Had something like this in a mock, Pub/Sub plus Dataflow is made for scaling and windowed order.
Definitely D for this one
D not C. Dataflow handles windowed ordering and scales automatically. Pub/Sub plus Dataproc won't guarantee the window semantics here.
D
Be respectful. No spam.