Q: 17
You have an upstream process that writes data to Cloud Storage. This data is then read by an Apache
Spark job that runs on Dataproc. These jobs are run in the us-central1 region, but the data could be
stored anywhere in the United States. You need to have a recovery process in place in case of a
catastrophic single region failure. You need an approach with a maximum of 15 minutes of data loss
(RPO=15 mins). You want to ensure that there is minimal latency when reading the dat
a. What should you do?
Options
Discussion
Seen similar on practice exams. Probably A.
A since the question is about cleansing before AI and analytics, but if true streaming was needed maybe C. Context matters here.
Wouldn't C be better for real-time cleansing though? Dataflow can process and clean on the fly.
Be respectful. No spam.