1. IBM watsonx.ai Documentation: In the official documentation for tuning foundation models
it states: "If you want output that is more predictable and focused
try a lower temperature. If you want more variety
try a higher temperature." This supports decreasing the temperature to address verbose and unfocused (repetitive) output.
Source: IBM Cloud Docs
"Tuning foundation models". Section: "Decoding parameters".
2. Stanford University Courseware (CS224N): Lecture materials on sequence generation explain that temperature is used to scale the logits before applying the softmax function. A lower temperature makes the probability distribution sharper
favoring more likely words and leading to less random
more focused text. A higher temperature flattens the distribution
increasing randomness.
Source: Stanford CS224N: NLP with Deep Learning
Lecture Notes on Sequence Models and Generation.
3. Academic Publication on Text Generation: The paper "The Curious Case of Neural Text Degeneration" discusses decoding strategies. It notes that high-likelihood
human-like text is often found in a specific range of the probability distribution
avoiding both the repetitive nature of greedy decoding (low temperature) and the incoherent nature of highly random sampling (high temperature). Lowering the temperature from a high value brings the output closer to this desired range.
Source: Holtzman
A.
et al. (2019). "The Curious Case of Neural Text Degeneration". arXiv preprint arXiv:1904.09751. (Section 3 discusses sampling methods and the effect of temperature).