How models are trained

A brief intro to pretraining, fine-tuning, and RLHF — how a pile of text becomes a model that follows instructions.

Language models9 min · interactive

A chat model doesn't spring into existence knowing how to be helpful. It's built in stages, and each stage has a very different job — different data, a different objective, and a different thing it produces. Understanding those stages explains a lot of model behaviour: why a model knows so much yet sometimes ignores your instructions, or why two models with similar knowledge feel so different to talk to.

Here is the whole training loop in miniature: text is sampled from a giant corpus, tokenized into ids, pushed through the network, scored against the tokens that actually came next, and then the error flows backwards to nudge every weight. Press play to watch a few optimizer steps — the loss falls, the connection strengths change, and those weights are the model. Click any phase to read what's happening.

Zooming out, that same loop is run in three distinct training stages — different data, different objective, different result. Click through each to see what goes in and what comes out.

Input dataA massive pile of internet text, books, and code — trillions of tokens, mostly unlabeled.

ObjectivePredict the next token, over and over, across everything it reads.

ProducesA “base model” that has absorbed grammar, facts, and patterns — but only knows how to continue text, not how to be helpful. Ask it a question and it might reply with more questions.

In a sentence“Read the whole library until you can finish almost any sentence.”

Not every model uses all three stages, and the names vary — but this pretraining → fine-tuning → preference-tuning shape is the backbone of almost every modern chat model.

1 · Pretraining: learning language

The model is shown an enormous amount of text and given one relentless task: predict the next token. To get good at that, it has to implicitly learn grammar, facts, reasoning patterns, a little code, several languages — because all of those help it guess what comes next. The output is a base model: deeply knowledgeable, but not yet an assistant. It only knows how to continue text, so a question might just produce more questions.

2 · Supervised fine-tuning: learning to be helpful

Next, humans write a comparatively small set of high-quality prompt → ideal-answer examples, and the model is trained to imitate them. This is where it learns the format of being useful: follow the instruction, stay on topic, answer directly. The result is an instruct model — the first version that feels like it's actually responding to you.

3 · Preference tuning: learning taste

Finally, the model's answers are refined against human preferences. People rank competing answers, and the model is nudged toward the ones humans prefer — classically with a reward model and reinforcement learning (RLHF), or more directly with methods like DPO. This stage shapes the harder-to-specify qualities: helpfulness, honesty, harmlessness, tone, and knowing when to decline.

Why this matters

Most of what you experience as a model's “personality” comes from stages 2 and 3, while most of what it knows comes from stage 1. That split explains a lot: fine-tuning can make a model friendlier or safer without teaching it new facts, and a model can confidently state something wrong because next-token prediction rewards fluent, plausible text — not necessarily true text.

Up next in this track: sampling & temperature — how a trained model actually picks each token. Back to all modules →