1. Ng, A. (2018). Machine Learning Yearning. Chapter 11: Splitting your data. This chapter explains the critical role of a development (or validation) set to "evaluate ideas" and provides an unbiased measure of model performance, which is essential for making decisions like shipping a product.
2. Google. (n.d.). Machine Learning Crash Course: Validation Set. Google Developers. Retrieved from https://developers.google.com/machine-learning/crash-course/validation/what-is-a-validation-set. This official documentation states, "The validation set is used to evaluate the model's performance during development... to check if the model is generalizing well to unseen data," which directly corresponds to confirming performance against thresholds.
3. Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, 29-39. The "Evaluation" phase of the CRISP-DM methodology (Section 2.5) explicitly details the task of "Evaluate Results," which involves assessing the model against business success criteria (i.e., performance thresholds) before proceeding to deployment.