Sampling & temperature

How a model turns a probability distribution into the next token.

Language models7 min · interactive

At every step a language model outputs a probability for every token in its vocabulary. It doesn't just take the most likely one — it samples. How it samples is what makes a model feel robotic or creative, reliable or unhinged. Play with the knobs below and watch the distribution change.

Temperature

Temperature divides the scores before they're turned into probabilities. Low temperature exaggerates the gap between tokens (the model gets greedy and deterministic); high temperature shrinks it (everything becomes more equally likely, so output gets more surprising — and more error-prone).

Top-k and top-p

These trim the long tail before sampling. Top-k keeps only the k most likely tokens; top-p (nucleus) keeps the smallest set whose probabilities add up to p. Both stop the model from occasionally picking absurd low-probability tokens, while still leaving room for variety among the plausible ones.

← Back to all modules