Every benchmark that asks "which embedding model is best" quietly assumes you are willing to run a transformer for every string you embed. That assumption is the expensive part. A 22-million-parameter encoder is small by 2026 standards, but you still pay for a full forward pass on every query and every document, and at index-scale — tens of millions of chunks — that pass is most of your bill and nearly all of your latency.

Static embeddings ask a heretical question: what if you ran the transformer once, ahead of time, and then never again?

The trick: an embedding without the network

A static embedding model is a lookup table. For each token in the vocabulary it stores one fixed vector. To embed a sentence you look up each token's vector and average them. That is the entire inference path — no attention, no layers, no forward pass. It is closer to a dictionary lookup than to a neural network call.

This is why the numbers are absurd. Minish Lab's Model2Vec reports running up to 500x faster on CPU than its teacher model, at roughly 50x smaller on disk. There is no GPU in the loop, no batching gymnastics, no warm-up. You embed a million short documents on a laptop while the transformer is still loading its weights.

The obvious objection is that we tried this twenty years ago and called it word2vec. We did, and it was worse — because word2vec and GloVe learn their vectors from raw co-occurrence counts. The thing that makes 2026's static embeddings different is where the vectors come from.

How Model2Vec is actually built

Model2Vec does not train on text. It distills an existing sentence transformer, and it needs no training data to do it:

  1. Forward-pass the vocabulary through the teacher. Push every token through a strong embedding model and capture its output embedding. This is the key move — you are harvesting the context-distilled representations a trained transformer already produced, not co-occurrence statistics.
  2. PCA the result. Principal component analysis reduces the dimensionality, but its real job is to center and normalize the embedding space; Minish Lab notes it improves quality even when you don't shrink the dimensions.
  3. Weight tokens by Zipf rank. Rare tokens should count more than "the" and "of." Classic methods use IDF, which needs a corpus. Model2Vec approximates frequency from a token's rank in a frequency-sorted vocabulary — Zipf's law as a free stand-in for IDF, with no external data required.

Because each table entry inherits the teacher's learned representation, Model2Vec "outperforms any other static embeddings such as GloVe and BPEmb by a large margin." You can distill your own domain-specific model from your own teacher in minutes.

What it costs in quality — the honest number

Here is where you have to be a statistician and not a salesperson. Static embeddings are not free; they are cheap, and the difference matters.

Minish Lab's potion-base-32M scores 52.13 on MTEB — about 93% of all-MiniLM-L6-v2, a respected dense baseline. The retrieval-tuned potion-retrieval-32M lands lower, around 82% of the same baseline on retrieval specifically. And potion-multilingual-128M covers 101 languages, distilled from bge-m3. So the headline is roughly: you keep 85–93% of teacher quality, and the harder the task, the more of that last slice you forfeit.

Sentence Transformers reached the same place from the opposite direction. In January 2025, Hugging Face's Tom Aarsen published static models that are trained contrastively rather than distilled — static-retrieval-mrl-en-v1 retains 87.4% of all-mpnet-base-v2 on NanoBEIR while running 100x to 400x faster on CPU. They use Matryoshka truncation, so halving the retrieval dimensions costs only ~1.5%. Two roads — PCA distillation and contrastive training — converging on the identical artifact: a token lookup table.

Static embeddings don't make a model smarter. They make the throughput free, and charge you in context-sensitivity.

Where the missing 10% lives

The lost quality is not spread evenly — it is concentrated exactly where mean-pooling fails. Averaging token vectors throws away word order and context. "The dog bit the man" and "the man bit the dog" become nearly the same vector. Negation, word sense, and clause structure get flattened.

So the decision rule is clean:

The mistake the leaderboard encourages is treating embedding quality as the only axis. For most retrieval systems the binding constraint is not the top of the MTEB chart — it is the serving cost of running a transformer over your whole corpus. Static embeddings move that constraint by an order of magnitude or two, and ask, in return, that you stop pretending word order never mattered. For a first-stage index, that is a trade worth making far more often than the benchmark culture admits.