Question 6

Question

[Data Engineering]
A company wants to predict stock market price trends. The company stores stock market data each
business day in Amazon S3 in Apache Parquet format. The company stores 20 GB of data each day for
each stock code.
A data engineer must use Apache Spark to perform batch preprocessing data transformations quickly
so the company can complete prediction jobs before the stock market opens the next day. The
company plans to track more stock market codes and needs a way to scale the preprocessing data
transformations.
Which AWS service or feature will meet these requirements with the LEAST development effort over
time?

Accepted Answer

AWS Glue jobs

Sara J. · Answer

B

EmmaY · Answer

A tbh

Skyler R. · Answer

I don’t think EMR (B) is right for "least dev effort". Glue (A) offers managed Spark, scales easily, and there’s basically no infrastructure to maintain. EMR’s more hands-on and can be overkill unless deep Spark tuning is needed. Athena and Lambda just don’t fit here. Open to being corrected if anyone has counter examples from recent AWS docs.

Leo Y. · Answer

Actually it's A here. EMR (B) is a trap because it takes more ongoing setup and management, while Glue handles Spark with way less dev effort as they want. Athena and Lambda just can't do this scale for Spark batch jobs. If anyone's seen a newer AWS recommendation that changes this, let me know.

Chris I. · Answer

Probably B here since EMR handles Spark natively and you get a lot of flexibility for scaling big batch jobs. Glue is less manual but I think EMR's better for heavy Spark workflows. Not totally sure though, open to feedback.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE