Q: 1
A company extracts approximately 1 TB of data every day from data sources such as SAP HANA,
Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources
have undefined data schemas or data schemas that change.
A data engineer must implement a solution that can detect the schema for these data sources. The
solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a
service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation.
Which solution will meet these requirements with the LEAST operational overhead?
Options
Discussion
AWS Glue is built for this scenario, especially with changing or undefined schemas. B
Be respectful. No spam.
Q: 2
A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one
AWS Glue job. The solution must integrate with AWS services.
Which solution will meet these requirements with the LEAST management overhead?
Options
Discussion
A imo. Step Functions is fully managed and connects Lambda and Glue easily, so there’s barely any infra to handle. The other options need you to run EC2 or EKS, which means more ops overhead. Pretty sure that’s what they want here.
Be respectful. No spam.
Q: 3
A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time.
The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log
data.
Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?
Options
Discussion
Saw something like this in a practice test, it points to D. Redshift streaming ingestion has barely any setup compared to the others.
Be respectful. No spam.
Q: 4
A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five
reserved ra3.4xlarge nodes and uses key distribution.
A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that
run on the node are queued. The other four nodes usually have a CPU load under 15% during daily
operations.
The data engineer wants to maintain the current number of compute nodes. The data engineer also
wants to balance the load more evenly across all five compute nodes.
Which solution will meet these requirements?
Options
Discussion
Option B looks right. If one node is overloaded and others are mostly idle, that's usually a sign the distribution key isn't set well and data isn't spread out. Picking a column with high cardinality should make things more balanced across nodes. Pretty sure, but open to other ideas.
Be respectful. No spam.
Q: 5
A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data
engineer has set up the necessary AWS Glue connection details and an associated IAM role.
However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an
error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?
Options
Discussion
No comments yet. Be the first to comment.
Be respectful. No spam.
Q: 6
A company loads transaction data for each day into Amazon Redshift tables at the end of each day.
The company wants to have the ability to track which tables have been loaded and which tables still
need to be loaded.
A data engineer wants to store the load statuses of Redshift tables in an Amazon DynamoDB table.
The data engineer creates an AWS Lambda function to publish the details of the load statuses to
DynamoDB.
How should the data engineer invoke the Lambda function to write load statuses to the DynamoDB
table?
Options
Discussion
Option B
Be respectful. No spam.
Q: 7
A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the
company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS
Glue and Amazon EMR to process data.
The company wants to improve the existing architecture to provide automated orchestration and to
require minimal manual effort.
Which solution will meet these requirements with the LEAST operational overhead?
Options
Discussion
Nice clear question, matches what I've seen in exam reports. A is the way to go since AWS Glue workflows are built for managing ETL pipelines and are fully managed, so less operational effort needed.
Be respectful. No spam.
Q: 8
A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The
company has an ecommerce application that generates a dataset that contains personally
identifiable information (PII). The company has an internal analytics application that does not require
access to the PII.
To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to
implement a solution that with redact PII dynamically, based on the needs of each application that
accesses the dataset.
Which solution will meet the requirements with the LEAST operational overhead?
Options
Discussion
Its B, saw a similar question on exam reports. S3 Object Lambda lets you do dynamic redaction without duplicating data.
Be respectful. No spam.
Q: 9
A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the
data files are small, but other data files are tens of terabytes.
A data engineer must perform a change data capture (CDC) operation to identify changed data from
the data source. The data source sends a full snapshot as a JSON file every day and ingests the
changed data into the data lake.
Which solution will capture the changed data MOST cost-effectively?
Options
Discussion
C
Saw a similar question in some exam reports, open source lakehouse formats like Hudi or Iceberg are built for this scenario.
Be respectful. No spam.
Q: 10
During a security review, a company identified a vulnerability in an AWS Glue job. The company
discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.
A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must
securely store the credentials.
Which combination of steps should the data engineer take to meet these requirements? (Choose
two.)
Options
Discussion
D and E I think. Secrets Manager for storing creds safely, and IAM role needs permissions to access them. Not totally sure if that's it but makes sense from practice exams. Can someone who tried this confirm?
Be respectful. No spam.
Question 1 of 20 · Page 1 / 2