1. Barocas, S., & Selbst, A. D. (2016). Big Data's Disparate Impact. California Law Review, 104(3), 671-732. In Section II.A, "Proxies," the authors explain how seemingly neutral attributes like zip codes can serve as proxies for protected classes, leading to proxy discrimination (p. 683). https://doi.org/10.15779/Z38BG31
2. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), Article 115. Section 3.1.1, "Sample/Selection Bias," details how non-representative data samples, such as those from specific geographic locations, cause models to be biased and perform poorly when generalized. https://doi.org/10.1145/3457607
3. Stanford University. (CS 182: Ethics, Public Policy, and Technological Change). Course materials discuss how biased datasets, resulting from unrepresentative sampling, are a primary source of algorithmic unfairness and discriminatory outcomes. The concept of proxy variables like zip codes is a key topic in lectures on algorithmic bias.