Q: 2
[Data Engineering]
A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by
using Amazon Kinesis Data Firehose. The company uses a small, server-based application in each
store to send the data to AWS over the internet. The company uses this data to train a machine
learning model that is retrained each day. The company's data science team has identified existing
attributes on these records that could be combined to create an improved model.
Which change will create the required transformed records with the LEAST operational overhead?
Options
Discussion
Going with A here too. Using Lambda for transformation within Firehose avoids managing infrastructure, and AWS handles the scaling. The other choices involve running clusters or EC2, which is more to maintain. Pretty sure this is the least ops work, correct me if I missed something!
A . Using Lambda for transformation with Firehose means no server management, auto-scaling, and it's built right into the delivery stream. The other options add way more operational work. If everything fits in a Lambda, this is the easiest path. Tell me if I'm missing any curveballs here.
C/D? Both need more ops than A but can't tell if option D does anything the question wants.
Its A
Its A. EMR in B is tempting but it’s overkill for just transforming Firehose records, way more management needed compared to Lambda.
Probably A, B is a trap since EMR clusters need way more management and scheduling effort.
Option A. Less overhead since you can plug Lambda straight into Firehose for transformation, no need to manage clusters or EC2. Not 100% but pretty sure that’s what AWS recommends for this setup.
B tbh, but only if Lambda can't handle the transformation size or complexity. Otherwise A.
I don’t think it’s B. A is the only one with built-in transformation, less ops work.
B
Be respectful. No spam.