1. AWS Big Data Blog
"Implement a CDC-based ETL pipeline using Amazon S3
AWS Glue
and Apache Hudi": This article details the exact pattern described in the correct answer. It states
"Apache Hudi enables you to manage data at the record level in Amazon S3 to perform inserts
updates
and deletes... This helps in use cases like change data capture (CDC)..." This directly supports using a format like Hudi for the described CDC operation.
2. AWS Glue Developer Guide
"Using transactional data lake frameworks with AWS Glue": The documentation confirms native support for these formats. "AWS Glue supports the open-source transactional data lake frameworks: Apache Hudi
Apache Iceberg
and Linux Foundation Delta Lake. These frameworks allow you to run ACID transactions on your Amazon S3 based data lake." This shows that the solution in option C is a well-supported pattern on AWS.
3. AWS Lambda Developer Guide
"Lambda quotas": The official documentation lists the "Function timeout" as 900 seconds (15 minutes). This technical limitation makes option A infeasible for processing terabyte-scale files
which would take significantly longer.
4. AWS Database Migration Service User Guide
"What Is AWS Database Migration Service?": The guide describes AWS DMS as a tool to "migrate your data to and from most widely used commercial and open-source databases." Its primary use case is database-to-database replication
not performing CDC on flat files in S3. This makes options B and D architecturally inappropriate.