SFT: Imitation, and the Ceiling It Hits
Supervised fine-tuning is the imitation step of post-training: show the model (prompt → ideal answer) pairs and minimize cross-entropy on the target tokens. It teaches the format of being an assistant — and hits a ceiling that preference learning exists to break.
- sft
- fine-tuning
- post-training
- instruction-tuning