1. Huawei HCIA-Big Data V2.0 Course Material: In the module describing HDFS architecture and principles, it is explicitly stated that HDFS is suitable for scenarios requiring high throughput and fault tolerance for large files, using a streaming data access model. It also lists "not suitable for low-latency data access" and "inefficient for storing a large number of small files" as key limitations. (Reference: HCIA-Big Data V2.0 Training Material, Chapter 2: HDFS Distributed File System).
2. Official Vendor Documentation (Apache Hadoop): The official Apache Hadoop documentation for HDFS architecture outlines its core assumptions and goals. It states, "HDFS is built around the idea that the most efficient data processing pattern is a write-once/read-many-times pattern... A HDFS application needs high throughput data access... It is not a low-latency data access filesystem." (Reference: Apache Hadoop Documentation, HDFS Architecture Guide, Section: "Assumptions and Goals").
3. Academic Publication (Inspiration for HDFS): The design of HDFS was heavily influenced by Google's GFS. The original paper states, "The system is built from inexpensive commodity components that often fail... It must provide high aggregate throughput to many clients." This highlights the goals of fault tolerance and high throughput. (Reference: Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The Google file system. ACM SIGOPS operating systems review, 37(5), 29-43. Section 2.1, "Assumptions". DOI: https://doi.org/10.1145/1165389.945450).