Chapter 15 of 25
HNSW: How Vector Search Stays Logarithmic
Created May 28, 2026 Updated Jun 7, 2026
Brute-force vector search is O(N × d) — compare the query against every vector in the corpus using cosine or dot product. For 10M vectors at dim 1024, that's 10 billion multiply-adds per query — seconds on CPU. Production retrieval needs milliseconds. The way around this is approximate nearest neighbor (ANN) search, and the dominant ANN index in production is HNSW (Hierarchical Navigable Small World) — Malkov & Yashunin, 2018.
Why this one and not the alternatives? Unlike coarse-partition methods like IVF — which bucket vectors into clusters and only search the nearest few — HNSW has no hard cluster boundary, so it can route around a bad partition choice instead of missing a neighbor that happened to fall on the wrong side of one. That robustness is why production vector databases — Qdrant, Weaviate, Chroma, Milvus — are built around HNSW as the default or primary index (IVF and disk-based indexes stay relevant at billion-scale or under tight memory budgets).
HNSW combines two ideas:
1. Small-world graphs. Each node is connected to its closest neighbors plus a few long-range "shortcut" links. The local connections let you refine; the long-range ones let you teleport. The payoff: navigation paths tend to grow only roughly logarithmically with dataset size — any node reachable from any other in a handful of hops. (This is an empirical regularity of small-world graphs, not a worst-case guarantee.) It's the same structure the Watts-Strogatz model and the "six degrees of separation" intuition both describe.
2. Hierarchy. Several layers of small-world graph stacked on top of each other:
- Top layer — a handful of nodes, sparse, long-range jumps across the whole space.
- Middle layers — progressively denser.
- Layer 0 — every vector, fine-grained local connections.
Each node is randomly assigned to a top layer with exponential decay probability — most stay at layer 0, very few make it to the top.
Search procedure:
1. Enter at the top layer at a fixed entry point.
2. Greedy walk: move to the neighbor closest to the query.
3. When no neighbor improves, drop down one layer.
4. Repeat to layer 0.
5. Return the top-k nearest from the final neighborhood.
One refinement matters: the upper layers are essentially a greedy single-path walk, but layer 0 is not purely greedy. There the search keeps a candidate set of size ef_search, exploring several promising nodes at once instead of committing to one path. That beam-like set is exactly what ef_search controls — and why widening it buys recall at the cost of latency.
The atlas analogy: you don't find a street address by scanning every street in the country. You start with the country map, find the right region, zoom into the city, then the district, then the block. HNSW does the same — coarse routing at the top, fine refinement at the bottom.
Three parameters that matter:
M— number of bidirectional connections per node (typical: 16, 32, 48). Higher = better recall, more memory, slower insertion.ef_construction— candidate list size during graph build (typical: 200–400). Higher = better graph quality, slower build.ef_search— candidate list size at query time (typical: 50–200). Higher = better recall, higher latency.
M is fixed once you've built the index. ef_search is per-query — you tune it live for the recall/latency point you want.
The cost is memory. A 1M-vector HNSW index at dim 1024 with M = 32:
vectors (f32): 1M × 1024 × 4 bytes = 4 GB
graph: 1M × 32 × 8 bytes ≈ 256 MB
total: ≈ 4.25 GB
For 10M vectors at f32 that's 40+ GB just for the vectors. This is why production vector databases lean hard on quantization: int8 cuts vectors 4×, binary cuts 32× (with a rerank-on-fp32 step to recover precision). Quantization is what makes the math fit on a single machine.
ANN ≠ exact search. HNSW returns approximately nearest neighbors — the standard metric is recall@k, the fraction of queries where the true top-k is found. Typical production recall: 0.95–0.99. The point of HNSW is to pay 1–5% recall loss for a 1000× speedup.
How HNSW parameters interact with chunking choices, quantization, and the storage economics of production indexes: Chunking Strategies.