lenatriestounderstand

Engineering Understanding

Lena Tries
To Understand

Interactive notes on AI, ML, data, systems, and the beautiful things behind them.

Read . Run . Understand .

Py Python-first
Run in-browser

LLM / preference-learning.ipynb

How LLMs Learn Human Preferences: RLHF, RLAIF and Beyond

Nothing in next-token prediction tells a model which of two fluent answers is better — preference learning installs that judgment. RLHF, reward models, PPO/DPO, and where a model's personality and sycophancy are quietly decided.

In [1]: reward_model.preference_prob(delta_r)

winner preferred 88% of the time · Δr = 0 → coin flip

a human comparison becomes one number — and the policy is bent to climb it

RLHF / RLAIF reward model PPO / DPO

Shorts

See all 28 shorts

SFT: Imitation, and the Ceiling It Hits

Supervised fine-tuning is the imitation step of post-training: show the model (prompt → ideal answer) pairs and minimize cross-entropy on the target tokens. It teaches the format of being an assistant — and hits a ceiling that preference learning exists to break.

sft
fine-tuning
post-training
instruction-tuning

RLHF: From Likelihood to Preference in Three Stages

Reinforcement Learning from Human Feedback is how a model is bent from optimizing likelihood to optimizing preference: an SFT base, a reward model trained on human comparisons, and a policy optimized against that reward under a KL leash.

rlhf
reward-model
ppo
preference-learning

RLAIF: When the Labeler Is a Model

Reinforcement Learning from AI Feedback swaps the human labeler for a model: a capable model judges which of two responses is better, and those AI preferences train the reward. Constitutional AI is its most influential form — and the values don't disappear, they move and become explicit.

rlaif
constitutional-ai
reward-model
alignment

What is Grouped-Query Attention (GQA)?

GQA sits between full multi-head attention and MQA: query heads are partitioned into a small number of groups, and each group shares one K/V set. Most of the KV-cache savings of MQA, most of the head diversity of MHA — and a cheap conversion path from existing checkpoints.

llm
attention
gqa
kv-cache

Latest long reads

See all 37 notes

How LLMs Learn Human Preferences: RLHF, RLAIF and Beyond

How a next-token predictor is bent toward what humans prefer: reward modeling, PPO, RLAIF/Constitutional AI, and the offline-preference family — and why so much of a model's 'personality' and 'emotional' behavior is decided in this stage.

rlhf
rlaif
reward-model
ppo
+1

Why Different Models Feel Like Different Personalities

Same engine, different knobs: why one model reads as warm and another as businesslike. Traces 'personality' to concrete training choices — data mix, preference guidelines, reward model, safety tuning, character training — and folds in sycophancy as a personality artifact of RLHF.

personality
rlhf

Why LLMs Sound Emotional — and Whether They Understand Emotion

Two halves of one question. Why an LLM's emotional language is generated, not felt, and where it comes from — preference data, reward models, safety tuning, system prompts; and whether it can actually understand emotion in others — theory of mind, the recognition benchmarks, and where the fluent performance turns brittle.

emotion
empathy
safety-tuning
theory-of-mind

Embeddings and Retrieval

Embeddings: How Geometry Pretends to Be Meaning

Embeddings aren't an encoding of text — they're an attempt to make geometry behave as if it carried meaning. What it means to compress text into a fixed-length vector, how contrastive learning turns statistical structure into distances and directions, why cosine similarity works (and when it stops), how dimension, chunking, context window, and reranking change the physics of a retrieval pipeline, and where embeddings lie usefully.

embeddings

Categories

Storage and Streaming

Storage formats, streaming, object stores, relational databases.

4 chapters Read

Time Series

Data preparation, forecasting models, deep learning, foundation models, evaluation, maintenance.

10 chapters Read

LLM

Transformer architecture, the generation loop, prompt engineering, structured outputs, agent patterns.

13 chapters Read

Embeddings and Retrieval

Embeddings, vector databases, sparse and hybrid retrieval, chunking, reranking, and the search lineage from BoW to E5.

3 chapters Read

Econometrics

Endogeneity, IV, RDD, panel data, causal ML.

7 chapters Read

Favorites

Why LLMs Sound Emotional — and Whether They Understand Emotion

Two halves of one question. Why an LLM's emotional language is generated, not felt, and where it comes from — preference data, reward models, safety tuning, system prompts; and whether it can actually understand emotion in others — theory of mind, the recognition benchmarks, and where the fluent performance turns brittle.

emotion
empathy
safety-tuning
theory-of-mind

Why Different Models Feel Like Different Personalities

Same engine, different knobs: why one model reads as warm and another as businesslike. Traces 'personality' to concrete training choices — data mix, preference guidelines, reward model, safety tuning, character training — and folds in sycophancy as a personality artifact of RLHF.

personality
rlhf

Embeddings and Retrieval

Embeddings: How Geometry Pretends to Be Meaning

Embeddings aren't an encoding of text — they're an attempt to make geometry behave as if it carried meaning. What it means to compress text into a fixed-length vector, how contrastive learning turns statistical structure into distances and directions, why cosine similarity works (and when it stops), how dimension, chunking, context window, and reranking change the physics of a retrieval pipeline, and where embeddings lie usefully.

embeddings

Pricing and Elasticity

Pricing as the worked-example for the Econometrics track — why price is endogenous, the identification strategies (IV, FE, RD, DML, CATE), cross-price elasticity and cannibalization, and what-if analysis with its constant-elasticity caveats.

pricing
elasticity
demand
causal-inference

Causal ML Beyond Econometrics

Causal ML at the meeting point of econometrics and ML — ATE vs CATE, uplift modelling, DML and orthogonal-moments inference, causal forests, counterfactual prediction, off-policy evaluation, and the standard mistakes from treating predictive models as causal.

causal-ml
uplift
dml
cate

Fine-Tuning LLMs: When the Weight Delta Is Worth It

Fine-tuning is not prompt repair. It is a decision to write a reusable parameter delta into an existing checkpoint. That delta changes future logits, defaults, and trade-offs. This note is about when that is worth doing: what fine-tuning actually buys, how to tell whether a gap belongs in the weights, and why data, evals, forgetting, and probability shape matter more than the slogan 'just fine-tune it'.

fine-tuning
residual-stream

Embeddings and Retrieval

Chunking Strategies

Chunking turns out to be more architectural than it looks. A walk through what a chunk has to satisfy at six different stages simultaneously, what RecursiveCharacterTextSplitter actually does under the hood, and what the 2024–2026 toolbox — late chunking, contextual retrieval, contextualized chunk embeddings, BGE-M3 multi-functionality, ColBERT late interaction — is actually for.

chunking
rag

Embeddings and Retrieval

How Text Became Geometry

Sixty years of incremental work behind the modern embedding vector. From bag-of-words and BM25, through Word2Vec's distributional hypothesis, to contextual embeddings and contrastive retrieval models. The point isn't trivia — it's that each older idea is still in production today, and the picture only makes sense once you've seen the road that led here.

embeddings
history
retrieval

The Hindsight Corpus: Time in LLM Pretraining Data

Saying a model was 'trained on text written before T' invites a picture of human knowledge as of T. The actual corpus is volumetrically skewed toward recent years, dominated by retroactively-edited sources like Wikipedia, missing reliable per-document timestamps, and survivor-biased for older periods. The mechanisms, the failure modes that fall out, what's silently absent from datasheets, and what time-aware pretraining would have to do differently.

pretraining
training-data
temporal

The Missing Now: Temporal Grounding in LLM Agents

A chat transcript preserves order but not elapsed time, world state, or whether earlier hypotheses have expired. For long-running agents, temporal grounding is a runtime problem, not a model problem — what 'now' actually is, the failure modes that fall out when context gets treated as state, the primitives (clocks, event logs, state reducers, expectations, monitors) that close the gap, and how to measure whether it works.

agents
temporal-grounding
state-management

The Physics of Hallucination

What hallucination looks like at the level of the transformer's internal computation — distributed representations, signal competition in the residual stream, the softmax bottleneck, the activation-output gap, and the architectural reasons there is no first-class epistemic channel.

hallucinations