lenatriestounderstand

Section 3 of 5

LLM

Transformer architecture, the generation loop, prompt engineering, structured outputs, agent patterns.

10 chapters in this section.

01

How LLM Generation Works: Transformer, Sampling, Tokens, Batching, and Validation

What happens inside a transformer when you send a prompt, and how the practical knobs — temperature, max_tokens, structured outputs, batching strategy, retry-with-catch-up — fall out of that picture.

  • llm
  • transformers
  • attention
  • tokenization
Read
Updated May 27, 2026
02

Attention Is All You Need — But Not All Attention Is the Same

Why modern LLMs are no longer just decoder-only transformers with standard multi-head attention. Attention has become a design space — MHA, MQA, GQA, MLA, sliding-window, sparse, linear, recurrent, hybrid — plus position encoding, attention sinks, and KV-cache compression. Each variant solves a different bottleneck.

  • attention
  • kv-cache
  • long-context
Read
03

Prompt Engineering

What separates a working LLM prompt from a flaky one in 2026 — instruction hierarchy, in-context learning, chain-of-thought, structured outputs, reasoning-model specifics, and the prompt-injection trust boundary.

  • prompt-engineering
Read
Updated May 8, 2026
04

The Physics of Hallucination

What hallucination looks like at the level of the transformer's internal computation — distributed representations, signal competition in the residual stream, the softmax bottleneck, the activation-output gap, and the architectural reasons there is no first-class epistemic channel.

  • hallucinations
Read
Updated May 8, 2026
05

The Hindsight Corpus: Time in LLM Pretraining Data

Saying a model was 'trained on text written before T' invites a picture of human knowledge as of T. The actual corpus is volumetrically skewed toward recent years, dominated by retroactively-edited sources like Wikipedia, missing reliable per-document timestamps, and survivor-biased for older periods. The mechanisms, the failure modes that fall out, what's silently absent from datasheets, and what time-aware pretraining would have to do differently.

  • pretraining
  • training-data
  • temporal
Read
Updated May 13, 2026
06

LLM Agent Architectures

Agent architecture is where LLM engineering stops being mostly about prompts and starts looking like distributed systems. Covers workflows vs agents, the classical loop, five paradigms (ReAct, Function Calling, Plan-and-Execute, Reflection, CodeAct), MCP as the protocol layer above per-vendor function calling, multi-agent patterns, computer use, memory and resumability, production failure modes including indirect prompt injection, tool security, cost levers, and observability.

  • agent-architectures
  • tool-use
  • mcp
  • multi-agent
Read
Updated May 27, 2026
07

The Missing Now: Temporal Grounding in LLM Agents

A chat transcript preserves order but not elapsed time, world state, or whether earlier hypotheses have expired. For long-running agents, temporal grounding is a runtime problem, not a model problem — what 'now' actually is, the failure modes that fall out when context gets treated as state, the primitives (clocks, event logs, state reducers, expectations, monitors) that close the gap, and how to measure whether it works.

  • agents
  • temporal-grounding
  • state-management
Read
Updated May 13, 2026
08

Fine-Tuning LLMs: When the Weight Delta Is Worth It

Fine-tuning is not prompt repair. It is a decision to write a reusable parameter delta into an existing checkpoint. That delta changes future logits, defaults, and trade-offs. This note is about when that is worth doing: what fine-tuning actually buys, how to tell whether a gap belongs in the weights, and why data, evals, forgetting, and probability shape matter more than the slogan 'just fine-tune it'.

  • fine-tuning
  • residual-stream
Read
09

Fine-Tuning LLMs: Post-Training Is a Pipeline, Not a Step

Post-training is not one fine-tuning method. It is a sequence of objective signals. Continued pretraining teaches substrate, SFT teaches examples and defaults, preference optimization teaches comparisons, RLVR teaches verifiable trajectories, and distillation transfers the resulting behavior. The important design question is not which acronym is fashionable, but which stage matches the behavior you are trying to install.

  • fine-tuning
  • post-training
  • sft
  • dpo
  • +3
Read
10

Fine-Tuning LLMs: Modern Post-Training Deep Dive

A reference-style deep dive into the modern knobs around post-training: preference optimization variants, LoRA and PEFT methods, memory-efficient full fine-tuning, model merging, distillation, tooling, and serving. Read this after the pipeline note, once you know which stage you actually need.

  • fine-tuning
  • dpo
  • lora
  • qlora
  • +4
Read