1. Amazon Bedrock User Guide, "Inference parameters for foundation models": This official documentation specifies the function of key parameters.
Temperature: "Use a lower value to decrease randomness in the response."
Top K: "A lower value limits the choices to more probable tokens."
Top P: "A lower value limits the choices to more probable tokens."
This confirms that lowering these values reduces randomness and promotes deterministic output.
2. Holtzman, A., et al. (2020). "The Curious Case of Neural Text Degeneration." International Conference on Learning Representations (ICLR). This peer-reviewed paper discusses various decoding strategies.
Section 2, "Decoding Strategies": The paper explains that low-temperature sampling is closer to greedy decoding (most deterministic), and that Nucleus Sampling (topp) and topk sampling are methods to truncate the vocabulary to high-probability tokens, thus controlling randomness. Lowering k or p makes the output less random.
3. Stanford University, CS224N: NLP with Deep Learning, Winter 2023 Lecture Slides:
Lecture 10, "Language Models and Recurrent Neural Networks", Slides 73-76: These slides detail decoding algorithms. They explain that a lower temperature makes generation more conservative (less random) and that top-k and top-p (Nucleus) sampling restrict the choices to the most probable tokens, thereby controlling the output.