Q: 12
Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it
was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However,
the recent increase in data has meant the batch jobs are falling behind. You were asked to
recommend ways the development team could increase the responsiveness of the analytics without
increasing costs. What should you recommend they do?
Options
Discussion
Option A makes sense if the main constraint is not raising costs. Pig sits on top of Hadoop, so you can optimize scripts for responsiveness without spinning up more resources, unlike C. Spark (B) works great but often needs infrastructure tweaks or more powerful nodes, which could mean extra spending. Hard to tell unless you know the exact bottleneck, but for exam wording 'no increased cost', A fits best I think. Thoughts?
Probably A here. If you need better responsiveness with no extra cost, Pig can help you optimize MapReduce jobs without extra infra. Spark (B) feels tempting but usually means more resources or complexity. Pretty sure A's what the question wants, but open to other opinions.
A tbh. Pig can help tweak processing without extra infrastructure spend, which fits the "no added cost" part. Spark might be faster but has migration headaches and could impact cost control. Disagree?
A . Tempting to pick B since Spark's fast, but it usually means more cost or infra changes. This one's about optimizing within current setup, so A fits better even if it's not the most modern choice.
Saw a really similar one in my mock, chose B.
Be respectful. No spam.