You built a retrieval pipeline. A user invokes their right to be forgotten. You run DELETE FROM embeddings WHERE user_id = 'u_8842', the call returns success, and you close the ticket.
You are not done. You are maybe five percent done.
The uncomfortable truth about erasure in a RAG system is that by the time the request arrives, the person's data is no longer in one place. It was ingested, chunked, embedded, cached, logged, and — if you were unlucky — trained on. GDPR Article 17 does not give you the right to erase a row; it gives the data subject the right to have their personal data erased, wherever it landed. Article 12(3) starts a one-month clock on that. The single query you ran addresses one copy. This piece is about the others, and about the fact that even the copy you did target probably isn't gone yet.
The delete is a tombstone#
Start with the store you think you handled. In most vector databases, a delete is a logical operation, not an immediate physical removal — and the gap between the two is where compliance quietly fails.
- Milvus marks deleted entities as logically removed and only reclaims their space during compaction, a background pass that merges segments and purges the tombstoned records.
- pgvector's HNSW index leaves dead tuples in the proximity graph after a
DELETE. They stay there — traversable, on disk — untilVACUUMrepairs the graph and frees them. There is a long-standing issue about recall and bloat from exactly this. - Qdrant applies the delete immediately if you pass
wait=true, but the underlying segment storage is reclaimed on a later optimize pass.
So the vector you "deleted" can still occupy disk and can still be walked by a nearest-neighbor search until housekeeping runs. For an engineering bug that's a footnote. For a legal deadline it's the whole point: the event GDPR cares about is the physical purge, not your API's 200 OK. If your compaction or vacuum schedule is measured in weeks, your erasure is measured in weeks — and you need it inside a month.
The question is never "did the delete succeed?" It's "when does the byte actually leave the disk?" — and in a vector database those are two different timestamps.
The metadata escape hatch you assumed you had#
The obvious design is: tag every vector with a user_id, and when erasure comes, delete by that filter. It's clean. It also doesn't work everywhere.
Pinecone serverless does not support deleting by metadata filter. On that tier you can delete by ID or by namespace — not by an arbitrary tag. Teams discover this the first time they try to run their GDPR path in production, which is the worst possible time. Qdrant and Weaviate do support filter- and tenant-scoped deletes, so the capability is real — it's just not portable, and "we'll delete by filter" is not a plan you can write down once and trust across engines and tiers.
The design that is portable is to stop relying on a scan-and-match at delete time and isolate at ingest time: one namespace, tenant, or collection per subject — the same multi-tenant isolation you'd reach for to keep one customer's data out of another's results. Then erasure is "drop the partition" — a single cheap operation whose completion you can actually prove, which matters more than it sounds, because "prove it's gone" is a real GDPR obligation and "we filtered on a tag" is a much weaker attestation than "we dropped the container."
The copies you forgot you made#
Now the fan-out. Walk the data's actual path through your system and you'll find it in places your delete query never touched:
- The source chunk store. You embedded from something. The original text usually lives in its own table or object store — the same one you already reconcile when you keep the index in sync with the source — entirely separate from the vectors. Delete the vectors, keep the chunks, and you've deleted the search index while retaining the personal data verbatim.
- The semantic cache. If you cache responses by prompt similarity, a user's data is sitting in cached prompt/response pairs. Many caches (GPTCache among them) have no "delete everything for user X" API — they're keyed by content, not by subject. The mitigation is to scope and TTL the cache so entries age out, not to pretend it isn't a copy.
- The trace and eval logs. Every request you sent to LangSmith or Langfuse for observability carries inputs and outputs — often the very PII in question. Both support deletion (
delete_runs, a delete API), but both soft-delete first and purge physically on a delay. Fine — just confirm that delay lands inside your one-month window.
None of these are exotic. They're the standard furniture of a production RAG stack. The failure mode isn't ignorance of any one of them; it's not having a written list of all of them, so the erasure runbook covers the index and misses the other four.
The one-way door#
Then there's the copy you can't take back. If you fine-tuned a model on data that includes the user's information, deleting the training rows does not remove what the weights learned. "Machine unlearning" is an active research area, and the honest state of it is that it doesn't reliably work yet: recent evaluations conclude that verifying a model has actually forgotten something is inconclusive — a model can suppress a fact in its outputs while still encoding it internally. You cannot attest to a regulator that a fine-tune has forgotten a person when the field cannot yet attest it to itself.
The remedy is not a better unlearning algorithm you're hoping ships next quarter. It's architectural, and you make the decision before you train: keep erasable personal data in the retrieval layer, where a delete is a real delete, and out of anything whose gradient you compute. Retrieval is the tractable place to be forgotten. Weights are the one-way door.
Erasure in RAG isn't a query; it's a distributed-systems problem with a legal deadline attached. The data fanned out, so your delete has to fan out too — to the index and its physical compaction, the chunk store, the cache, the traces — and the design that keeps that tractable is deciding, at ingest, that each subject's data lives somewhere you can drop whole. Do that, and "right to be forgotten" is a partition drop you can prove. Skip it, and it's a 200 OK that left the data exactly where it was.



