Q: 18
You want to build a managed Hadoop system as your data lake. The data transformation process is
composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating
storage from compute, you decided to use the Cloud Storage connector to store all input data,
output data, and intermediary dat
a. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared
with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis
shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What
should you do?
Options
Discussion
Option A Similar questions in the GCP practice exams highlight in-memory processing as key for performance issues.
C/D? I don’t think C helps since network isn’t the bottleneck here, but D seems like a trap too. Going with A.
Probably A, saw a similar question in an old exam report. In-memory data handling helps with I/O bottlenecks.
A tbh. In-memory processing avoids the disk I/O bottleneck, which this scenario is all about. B is a common distractor but won't help as much if your real problem is slow storage rather than capacity. Pretty sure A's what Google expects here, seen it in similar practice sets, but open to correction.
D , but check the official study guide and sample questions on I/O bottlenecks.
A imo
Be respectful. No spam.