Question 18

Question

You want to build a managed Hadoop system as your data lake. The data transformation process is
composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating
storage from compute, you decided to use the Cloud Storage connector to store all input data,
output data, and intermediary dat
a. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared
with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis
shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What
should you do?

Accepted Answer

Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular
Hadoop job can be held in memory

Priya J. · Answer

Option A Similar questions in the GCP practice exams highlight in-memory processing as key for performance issues.

Ravi O. · Answer

C/D? I don’t think C helps since network isn’t the bottleneck here, but D seems like a trap too. Going with A.

Anita R. · Answer

Probably A, saw a similar question in an old exam report. In-memory data handling helps with I/O bottlenecks.

Ravi S. · Answer

A tbh. In-memory processing avoids the disk I/O bottleneck, which this scenario is all about. B is a common distractor but won't help as much if your real problem is slow storage rather than capacity. Pretty sure A's what Google expects here, seen it in similar practice sets, but open to correction.

Mason T. · Answer

D , but check the official study guide and sample questions on I/O bottlenecks.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE