Q: 5
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use
Hadoop jobs they have already created and minimize the management of the cluster as much as
possible. They also want to be able to persist data beyond the life of the cluster. What should you do?
Options
Discussion
Why not go with D? Only GCS lets your data survive after cluster deletion, not just restarts. Persistent disks (option B) are tied to the cluster lifecycle.
Its D
B Had something like this in a mock and picked B since persistent disks should keep HDFS data around even if nodes go down. Seemed like the best fit at the time, but open if I'm missing something.
Its D. Persistent disk (B) only handles node reboots but not full cluster deletion, which is a common trap in these questions. GCS with Dataproc lets your data outlive the cluster, I think that's the key requirement here. Open to other reasons if someone thinks B still works.
D is correct here
B tbh
A is wrong, D. Persistent disks (B) don't help if you delete the cluster, GCS with the connector lets data survive no matter what. Classic trap in these migrations, saw a similar gotcha on practice tests.
My vote is D, saw a very similar question in some exam reports and GCS is the key for persistence.
Don't think D is the only close answer, B seems reasonable too since persistent disks can keep HDFS data across reboots. Persistent disk option trips people up because it sounds durable-am I missing something here? B
Definitely something similar in the official practice questions. D
Be respectful. No spam.