lenatriestounderstand

Chapter 14 of 25

Cosine, Dot Product, Euclidean: When Are They the Same Thing?

Created May 28, 2026 Updated May 28, 2026

Pick up any vector-database client and you'll see three similarity metrics: cosine, dot product, Euclidean. They look like different choices. On modern embeddings, they're usually not.

Cosine similarity measures the angle between two vectors:

cosine(a, b) = (a · b) / (‖a‖ · ‖b‖)

Range −1 to 1. Length-invariant — a long document and a short one with the same topic can still score 1.0.

Dot product is the unnormalized numerator: a · b. It's faster (one multiplication per dimension, no square roots) but mixes angle with magnitude. If magnitude correlates with length, confidence, frequency, or some training artifact, dot product will reward that magnitude in addition to semantic direction.

Euclidean distance is the straight-line gap between the tips: ‖a − b‖. Geometrically simpler, but on raw embeddings it has the same problem in reverse: it measures both direction and norm. If norm carries artifacts rather than meaning, distance becomes a mixed signal.

Here's where it collapses. Most modern embedding models output L2-normalized vectors — every embedding sits on the unit sphere by construction (or you normalize on write). On the unit sphere ‖a‖ = ‖b‖ = 1, and:

‖a − b‖² = 2 − 2 · cos(a, b)

This function is monotonic. For exact search, ranking by cosine similarity, dot product, or ascending Euclidean distance is identical. Top-1, top-10, top-1000 — same result, same order. The metric choice does not change retrieval.

So why does cosine remain the convention? Two reasons:

  1. Intuitive scale. −1 to 1 is easier to reason about than Euclidean's 0 to 2 on the unit sphere.
  2. Implementation efficiency. On normalized vectors, dot(a, b) = cos(a, b) — one numpy.dot returns both. The standard production pattern is normalize once at write time, then dot at query time. No division, no square root.

When does the metric choice actually matter?

  • Unnormalized vectors. Some models (older Word2Vec, GloVe, raw BERT pooling) emit vectors with meaningful magnitudes. Then cosine vs dot becomes a real choice. Most modern dense retrievers normalize internally; check the model card.
  • Embeddings designed for inner product. Some retrievers (DPR, some matryoshka variants) are trained with dot product as the scoring function and may not be L2-normalized — using cosine adds an unnecessary normalization the model wasn't trained with.
  • Mixing magnitude and angle on purpose. Some specialized scoring (e.g. BM25-weighted dense, learned sparse retrievers) deliberately uses dot product because magnitude carries signal.

In practice, the harder question for retrieval quality is rarely the metric — it's whether the space itself carries signal. Anisotropic spaces (raw pretrained BERT) have cosine similarities clustered around 0.8 between unrelated text, because all vectors point in similar directions. No metric rescues that; only contrastive training that explicitly spreads the space does.

Full breakdown of why cosine works at all, anisotropy, and the geometry retrieval depends on: Embeddings: How Geometry Pretends to Be Meaning.