There is a question that shows up in every vector-database tutorial and every RAG onboarding doc: cosine similarity, dot product, or Euclidean distance? It is presented as a real fork in the road, with tradeoffs to weigh. For the embeddings most teams actually use in 2026, it is very nearly a non-question — and the energy spent agonizing over it would be better spent on the two decisions hiding underneath, neither of which anyone frames as a choice at all.
What the three actually measure
Strip the jargon and there are only two geometric quantities in play: the direction a vector points, and its length.
- Cosine similarity measures direction only. It is the dot product divided by both magnitudes —
a·b / (‖a‖‖b‖)— which is the cosine of the angle between the vectors. Length is divided out by construction. - Dot product (inner product) measures direction and length together:
a·b. Two vectors at the same angle but different magnitudes get different scores; the longer one wins. - Euclidean (L2) distance measures the straight-line gap between the two points:
‖a−b‖. Closer is more similar.
Manhattan / L1 — the sum of absolute per-dimension differences — exists too, but for dense text embeddings it is a curiosity, not a default.
The fact that collapses the debate
Here is the part the tutorials bury. The moment your vectors are L2-normalized — scaled to unit length — the three metrics stop disagreeing. As scikit-learn puts it, cosine similarity is the L2-normalized dot product: project both vectors onto the unit sphere and their dot product is already the cosine of the angle. And the Euclidean distance between two unit vectors is locked to that same angle by a clean identity:
For unit vectors, ‖a − b‖² = 2 − 2(a·b). Euclidean distance is a strictly decreasing function of the dot product, which equals cosine. All three sort your candidates into the same order.
Same order means the same top-k. For ranking — which is all most retrieval does — cosine, dot product, and Euclidean are the same metric wearing three costumes once vectors are normalized.
And here is why that matters in practice: modern embedding models already normalize. OpenAI's docs state plainly that text-embedding-3 outputs are normalized to length 1, so "cosine similarity and Euclidean distance will result in the identical rankings." Sentence-Transformers ships normalize_embeddings=True; Cohere's v3 models are unit-normalized. For these — the bulk of production retrieval — the metric dropdown is decorative.
When the choice is real
The fork only appears when vectors are not normalized, and then the whole question reduces to a single word: magnitude. Dot product rewards it; cosine ignores it. This is a genuine design decision, not a default:
- If vector length encodes something you want in the ranking — term frequency in a sparse vector, document popularity, the model's own confidence — dot product is the right call, which is exactly why sparse and learned-sparse retrieval leans on inner product. Pinecone even requires
dotproductfor sparse indexes. - If length is noise — a long document is not "more relevant" just because it is long — cosine protects you by throwing magnitude away.
There is even an academic asterisk worth knowing: Steck et al.'s Is Cosine-Similarity of Embeddings Really About Similarity? shows that for certain learned embeddings (regularized linear and matrix-factorization models), cosine can produce arbitrary, even meaningless similarities. Cosine is a safe default for transformer text embeddings; it is not a law of nature.
The two decisions that actually move recall
So if the metric is mostly cosmetic, where does the recall go? Two places, both of them silent failures:
1. Match the metric your model was trained with. A model trained against a cosine objective expects cosine; query it with raw dot product on unnormalized output and nothing errors — scores simply skew toward high-magnitude vectors and your recall quietly drops. The fix is boring: read the model card, use the metric it specifies, and if it emits normalized vectors, cosine and dot product are interchangeable anyway.
2. Normalize once, then use inner product. Cosine costs a division by both magnitudes on every comparison. Normalize every vector a single time at insert and the magnitudes are 1, the division vanishes, and you can run raw dot product — equal to cosine, minus the arithmetic. This is precisely what Weaviate and Qdrant do internally: store normalized, score with dot product. It also lets the index use Maximum Inner-Product Search, the path SIMD and quantization kernels optimize hardest.
The one hard rule sits at the database layer: the metric you build the index with must match the metric you query with. pgvector exposes <=> (cosine), <#> (negative inner product), and <-> (L2) as distinct operator classes — build a cosine index and query with the L2 operator and Postgres silently falls back to a sequential scan, or worse, hands back the wrong neighbors with no warning. Whatever vector database you run, the metric is a property of the collection, not a per-query whim, and it has to agree with how your embeddings were quantized and indexed down at the ANN layer.
So: stop treating the similarity metric as the interesting knob. For normalized embeddings — which is what you almost certainly have — pick cosine, or pick dot product on normalized vectors for the speed, and move on. The recall you were worried about losing was never going to be lost there. It leaks out of the mismatch between how your model was trained and how your index was built, and that is the dial worth your attention.



