Data Lakes in OCI Object Storage store raw data for analysis. The three correct characteristics are:
Schema on read (C): Data Lakes store data in its raw, native format (e.g., JSON, CSV, Parquet) without
a predefined schema. The schema is applied when data is read or processed, not when written,
offering flexibility. For example, a Parquet file with sales data might be queried with SQL only when
analyzed, not structured upfront like in a database.
Multiple subject areas (D): Data Lakes aggregate data from diverse sources—sales, HR, IoT—
spanning multiple subject areas. This enables cross-domain analysis, like combining customer data
with weather data for insights, all stored in a single OCI bucket.
Mixed data types (E): Data Lakes support varied formats: structured (e.g., CSV tables), semi-
structured (e.g., JSON documents), and unstructured (e.g., videos). For instance, a bucket might hold
CSV logs, JSON events, and image files, all accessible for processing.
The incorrect options are:
High concurrency (A): Data Lakes in Object Storage are not designed for high-concurrency
transactional access (e.g., thousands of simultaneous updates). They’re optimized for batch
processing or analytics, unlike ATP’s concurrency focus.
High transaction performance (B): Transactional performance (e.g., fast commits) is a database
strength, not a Data Lake’s. Object Storage prioritizes scalability and durability over transactional
speed, making it unsuitable for OLTP workloads.
These traits make Data Lakes ideal for big data analytics, not real-time transactions.
Reference: Oracle Cloud Infrastructure Documentation - Object Storage Overview