Question 10 - Hitachi Vantara HCE-5920 Real Exam Questions [Feb 2026 Update]

Q: 10

In a PDI transformation you are retrieving data from a large lookup table using a Database Lookup step from improve performance, you enable caching in the step and use the Load all data from table option. In this scenario, which three statement s are correct about the data flow of the ‘Database Lookup step? (Choose three.)

Options

Correct Answer:

B, C, E

Explanation

When the 'Database Lookup' step in Pentaho Data Integration (PDI) is configured to cache and "Load all data from table," it pre-loads the entire lookup table into memory before processing the first input row. This has three key implications for the data flow:

1. The entire dataset is stored in the Java Virtual Machine's (JVM) heap, meaning sufficient heap space must be allocated to prevent OutOfMemoryError exceptions (B).

2. Once cached, the comparisons between the input stream's keys and the cached keys are performed in memory. By default, these Java-based string comparisons are case-sensitive (C).

3. The step is designed to return a single set of values for each input row. If multiple rows in the lookup table match the input key, the step will use the data from only the first matching row it encounters in the cache (E).

Why Incorrect

A. This is incorrect. The step performs an enrichment (similar to a left outer join). If no match is found, the new fields are populated with default values (nulls), and the original row is passed through.

D. This is a best practice for predictable results, not a requirement. The step will execute if multiple matches exist, but it will non-deterministically use only the first match found.

References

1. Hitachi Vantara - Pentaho Documentation

"Database lookup" step:

Reference for B: The documentation describes the "Load all data from table" option for the cache. This action inherently requires memory. The "Pentaho Performance Tuning" guide frequently discusses allocating sufficient heap space (-Xmx) for transformations that cache large datasets. Section: Cache.

Reference for E: The documentation explicitly states

"If you expect more than one row to be returned

the returned value will be the first one encountered." This confirms that only one matching row is used. Section: Options.

Reference for A (Incorrectness): The presence of fields to specify default values for lookup fields when a match is not found confirms that rows are not dropped

but rather are passed on with null/default values. Section: The fields to return.

2. Pentaho Data Integration Official Documentation (General Principles):

Reference for C: The underlying engine for PDI is Java. In-memory lookups using standard Java data structures like HashMap rely on the equals() and hashCode() methods of key objects. For java.lang.String

these methods are case-sensitive. This is a fundamental principle of the platform's operation when performing in-memory comparisons. This behavior is detailed in general Java and PDI developer guides.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE