1. Zhang
Z.
Zhao
Z.
Zheng
Y.
Cui
Z.
& Chen
E. (2023). A Survey on Data Quality for Trustworthy AI. IEEE Transactions on Knowledge and Data Engineering
1-21. DOI: https://doi.org/10.1109/TKDE.2023.3324511. The paper emphasizes that high-quality data is a cornerstone for building trustworthy AI
and data cleaning is a primary method to improve data quality.
2. Bishop
C. M. (2006). Pattern Recognition and Machine Learning. Springer. In Chapter 1
the text introduces the machine learning workflow
highlighting the essential nature of a "pre-processing" stage (p. 4) to transform raw data into a suitable format
which includes cleaning and normalization
before any model training can occur.
3. Hastie
T.
Tibshirani
R.
& Friedman
J. (2009). The Elements of Statistical Learning: Data Mining
Inference
and Prediction. Springer. The book implicitly and explicitly discusses the need for data preparation and handling issues like missing values throughout its chapters
establishing it as a foundational step before applying learning algorithms.