Sampling & temperature
How a model turns a probability distribution into the next token.
At every step a language model outputs a probability for every token in its vocabulary. It doesn't just take the most likely one — it samples. How it samples is what makes a model feel robotic or creative, reliable or unhinged. Play with the knobs below and watch the distribution change.
Temperature near 0 makes the model greedy — it almost always takes the top token (deterministic, repetitive). Higher temperature flattens the distribution, giving rarer tokens a real chance (creative, but riskier). Top-k / top-p first chop off the unlikely tail, then sampling happens only among what's left.
Temperature
Temperature divides the scores before they're turned into probabilities. Low temperature exaggerates the gap between tokens (the model gets greedy and deterministic); high temperature shrinks it (everything becomes more equally likely, so output gets more surprising — and more error-prone).
Top-k and top-p
These trim the long tail before sampling. Top-k keeps only the k most likely tokens; top-p (nucleus) keeps the smallest set whose probabilities add up to p. Both stop the model from occasionally picking absurd low-probability tokens, while still leaving room for variety among the plausible ones.