Question 1 - Top Amazon/AWS DEA-C01 Real Exam Questions [March 2026 Update]

Q: 1

A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change. A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation. Which solution will meet these requirements with the LEAST operational overhead?

Options

Correct Answer:

Explanation

The optimal solution is AWS Glue. It is a fully managed, serverless extract, transform, and load (ETL) service designed for this exact use case. AWS Glue crawlers can automatically scan various data sources (including JDBC sources like SAP HANA and SQL Server, and NoSQL sources like MongoDB and DynamoDB), infer the schema, and populate the AWS Glue Data Catalog. This directly addresses the requirement for handling undefined or changing schemas. The subsequent ETL jobs, which run on a serverless Apache Spark environment, can then use this catalog to process and load the data into Amazon S3. This serverless architecture ensures the "LEAST operational overhead" as there are no clusters to provision or manage.

References

1. AWS Glue Developer Guide: "AWS Glue provides a serverless environment to run your ETL jobs on a fully managed

scale-out Apache Spark environment... AWS Glue crawlers scan your data stores to determine the schema for your data and then create a metadata table in your AWS Glue Data Catalog."

Source: AWS Glue Developer Guide

"What is AWS Glue?"

Introduction section.

2. AWS Documentation - Choosing between AWS Glue and Amazon EMR: "AWS Glue is a good choice when your use case is ETL and you are looking for a serverless offering... If you want to avoid managing servers

you can use AWS Glue to run your Spark and Python shell workloads." This highlights Glue's lower operational overhead.

Source: AWS Big Data Blog

"Choosing between AWS Glue and Amazon EMR".

3. AWS Glue Developer Guide - Crawlers: "A crawler connects to a data store

progresses through a prioritized list of classifiers to determine the schema for your data

and then creates metadata tables in your AWS Glue Data Catalog." This confirms its schema detection capability for diverse sources.

Source: AWS Glue Developer Guide

"Defining Crawlers"

Crawler concepts section.

4. AWS Lambda Developer Guide - Quotas: "Timeout: 900 seconds (15 minutes)". This confirms the execution time limit that makes Lambda unsuitable for the described large-scale ETL task.

Source: AWS Lambda Developer Guide

"Lambda quotas".

5. Amazon Redshift Database Developer Guide - Redshift Spectrum: "Amazon Redshift Spectrum is a feature of Amazon Redshift that enables you to run SQL queries against exabytes of unstructured data in Amazon S3." This shows Redshift Spectrum is for querying data already in S3

not for performing the initial ETL to S3.

Source: Amazon Redshift Database Developer Guide

"Getting started with Amazon Redshift Spectrum".

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE