How LLM Generation Works: Transformer, Sampling, Tokens, Batching, and Validation
What happens inside a transformer when you send a prompt, and how the practical knobs — temperature, max_tokens, structured outputs, batching strategy, retry-with-catch-up — fall out of that picture.
- llm
- transformers
- attention
- tokenization