Question 4 - Top Amazon/AWS DEA-C01 Real Exam Questions [March 2026 Update]

Q: 4

A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution. A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations. The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes. Which solution will meet these requirements?

Options

Correct Answer:

Explanation

The scenario describes a significant workload imbalance, with one node having a CPU load over 90% while others are under 15%. This is a classic symptom of data skew in Amazon Redshift. Data skew occurs when data is not distributed evenly across the compute nodes. In Redshift, the distribution of table rows to nodes is determined by the distribution key (DISTKEY). To resolve the imbalance, the DISTKEY should be set to a column with high cardinality (a large number of unique values), which ensures that the data is spread uniformly across all nodes. This directly addresses the root cause of the problem by balancing the workload, as required.

References

1. Amazon Redshift Documentation

"Choosing a data distribution style": "The goal of choosing a table distribution style is to distribute data as evenly as possible to parallelize the workload... If you specify KEY distribution

you must name a distribution key (DISTKEY) column... A column with high cardinality (a high number of unique values) helps distribute the data more evenly." This directly supports choosing a proper DISTKEY to resolve data skew.

2. Amazon Redshift Documentation

"Amazon Redshift best practices for designing tables": Under the section "Choose the best distribution style

" the documentation states

"To use KEY distribution

name one column as the distribution key (DISTKEY). The distribution key should have high cardinality... Choosing a column with low cardinality results in data skew

where some nodes process more data than others." This confirms that an improper DISTKEY causes the exact problem described.

3. Amazon Redshift Documentation

"Working with sort keys": "Sorting enables efficient handling of range-restricted predicate queries. Amazon Redshift stores your data on disk in sorted order according to the sort key." This reference clarifies that the sort key's function is for query optimization via data ordering

not data distribution across nodes.

4. Amazon Redshift Documentation

"Defining constraints": "Amazon Redshift doesn't enforce unique

primary key

and foreign key constraints... However

primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces them." This confirms that primary keys do not influence data placement.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE