Q: 15
A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of
millions of rows of user activity every day. ML engineers access the data to develop their ML models.
The ML engineers need to generate daily reports and analyze click trends over the past 3 days by
using Amazon Athen
a. The company must retain the data for 30 days before archiving the data.
Which solution will provide the HIGHEST performance for data retrieval?
Options
Discussion
C or D? D is overkill since separate S3 buckets per day makes management a pain and doesn't boost Athena performance. I think C (partition by date prefix) is best because Athena prunes partitions, so queries are way faster. Pretty sure that's what AWS recommends.
I think C here. Partitioning by date prefix lets Athena scan just what it needs for recent data, way faster than scanning everything. Also lines up with the lifecycle requirement. Saw a similar question on a practice test. Makes sense?
Probably D, question is really clear and concise for ML/Athena use cases.
Be respectful. No spam.