The model upgrade looked routine. A new embedding model had topped the leaderboard — better on long documents, cheaper per token, the obvious move. The team pointed their ingestion pipeline at it, kicked off a background job to re-embed 10 million vectors in place, and shipped. Latency held. Error rate flat. CPU normal. The reindex chewed through the corpus over the next six hours, and across those six hours, retrieval quality fell off a cliff — and not one dashboard so much as twitched.
This is the failure nobody warns you about, because it doesn't look like a failure. It looks like nothing.
The bill you're afraid of is the wrong bill
Ask a team what scares them about changing embedding models and they'll point at the re-embedding cost. Millions of documents, back through an API, surely that's the expensive part. It isn't, and it's getting cheaper every quarter. Qdrant's own migration tutorial puts a small corpus at a few hours and a few dollars. The API invoice is a rounding error.
The real cost is structural, and it's this: a vector from model A and a vector from model B do not live in the same space. They are not two dialects of one language; they are two unrelated coordinate systems that happen to share a number of axes. As Milvus's documentation puts it plainly, embeddings from different models can differ in dimensionality and scaling in ways that "preclude direct comparison of object coordinates, or even of the distances between objects." Cosine similarity between a model-A document and a model-B query isn't a worse number. It's a meaningless one.
Why the rolling reindex is a trap
Now hold that fact next to what a naive in-place reindex actually does. It walks your index document by document, overwriting each old vector with a new one. For the entire duration of the backfill, your index contains both: documents already migrated into the new space, and documents still sitting in the old one — in the same index, answering the same queries.
Your query gets embedded with exactly one model. So every nearest-neighbor lookup ranks candidates from two unrelated neighborhoods against a single yardstick. Half the results are genuinely near; the other half are noise that happens to score well by accident. Recall sags. The answers get subtly, unaccountably worse. And because every vector is the right shape and every request returns in time, your observability stack sees a perfectly healthy system.
Latency dashboards measure whether the index answered. They cannot measure whether it answered from the right space.
There's a name worth borrowing for this: index drift. Not data drift, not concept drift — your index is literally drifting between two geometries while it serves traffic.
The correct mental model is a schema migration
Here is the reframe that fixes everything. You are not swapping a model. You are running a database schema migration, and you already know how to do those safely. You never mutate a live column in place and hope; you add the new column, write to both, backfill, verify, and cut over. Apply that discipline verbatim to vectors:
- Dual-write. Every new or updated document gets embedded by both models and stored as two vectors. Qdrant offers this directly as named vectors on one collection; with pgvector you add an
embedding_v2column, as the Google Cloud Community migration guide lays out. - Version every vector. Persist
model_nameandmodel_version(Mixpeek recommends asource_hashtoo) alongside each vector. A vector with no provenance is a liability. - Scope every query to one version. A query embedded by the new model searches only new-model vectors. This is the rule that makes drift impossible — the two spaces never meet in a ranking.
- Backfill in the background. Batch through the historical corpus — small batches, a handful of workers — and throttle on replication lag, as the dbi services pgvector write-up details.
- Cut over atomically. Flip the read path to the new version only after the new space is fully populated and you've validated recall on a held-out set. Then retire the old vectors.
The internal links below go deeper on which model to actually pick — the embedding-model field for RAG agents, the Voyage vs OpenAI vs Cohere vs Gemini head-to-head, and the vector-database choice that determines whether dual-write is one flag or a weekend. But the picking is the easy half. The migration is the half that breaks production.
The shortcut that might let you skip the re-embed
If re-encoding ten million vectors is genuinely prohibitive, there's a newer option worth knowing. A "drift adapter" — a small learned transform trained on a sample of paired old/new embeddings — maps new-model queries into your existing old-model index, so you keep the index you already built. The EMNLP 2025 Drift-Adapter paper reports that a simple linear map (orthogonal Procrustes or low-rank affine) recovers 95–99% of full re-embedding's recall, at under 10 microseconds of added query latency. It's a vendor-of-research claim, on MTEB and a CLIP upgrade, not a universal guarantee — but it reframes "do I have to re-embed everything?" from a yes/no into a cost curve.
So: when the next better model ships — and it ships constantly — don't reach for the in-place reindex. Reach for the migration playbook you already trust for your database. The green dashboard isn't reassurance. During a vector migration, the green dashboard is the trap.



