1. Goodfellow
I.
Bengio
Y.
& Courville
A. (2016). Deep Learning. MIT Press. Chapter 3
"Probability and Information Theory
" establishes the probabilistic framework that underpins modern machine learning models
including generative AI.
2. Stanford University. (2023). CS224N: Natural Language Processing with Deep Learning
Lecture 11: Language Models and RNNs Part 2. This lecture explains that language models work by producing a probability distribution over the vocabulary for the next word
P(xt | x1
...
x{t-1})
and then sampling from this distribution to generate text. This sampling is the source of variability.
3. Holtzman
A.
Buys
J.
Du
L.
Forbes
M.
& Choi
Y. (2019). The Curious Case of Neural Text Degeneration. Proceedings of the International Conference on Learning Representations (ICLR). This paper details sampling strategies (e.g.
nucleus sampling) that are explicitly probabilistic and designed to control the randomness in text generation to produce higher-quality
non-deterministic outputs. (Available at: https://arxiv.org/abs/1904.09751)