Every B2B RAG product hits the same wall on day one of having a second customer: their documents and yours now live in the same vector index, and the only thing keeping Acme's answers out of Globex's chat is a line of code. Get that line right every time and you have a tidy multi-tenant system. Get it wrong once and you have a data-leak incident and a very bad email to write.

The vendor docs present this as a menu of features — Pinecone namespaces, Weaviate tenants, Qdrant payload partitioning, Milvus partition keys — and developers tend to shop it like one, comparing limits and pricing. That framing hides the only decision that matters. The three real patterns differ less in what they can do than in where the tenant boundary lives, and therefore in what happens the one time your code forgets it's there.

Pattern 1: metadata filtering — isolation by discipline

The simplest approach puts every tenant's vectors in one shared index and tags each with a tenant_id, then filters on it at query time. It works, it's cheap to build, and it has a quiet flaw: the filter is load-bearing in a way nothing enforces.

A vector index isn't a table you can JOIN against; it's an HNSW graph whose edges span all tenants. When you query, the search walks that shared graph and the tenant filter is the single mechanism deciding which neighbors are eligible. Omit it in one code path — a new endpoint, a background reindex job, a refactor that drops a parameter — and the search happily returns another customer's nearest neighbors. There's no error. It just leaks. Pinecone makes this subtext explicit, recommending per-namespace isolation precisely to reduce "the risk of application bugs that could query the wrong tenant's data."

There's a cost tax, too. On a shared index your query touches the whole corpus, not just the tenant's slice. Pinecone's own example: querying a 1GB namespace costs 1 request unit, while metadata-filtering for the same data inside a 100GB index costs 100 — you pay to scan everyone to answer for one.

Pattern 2: namespaces and tenants — isolation by construction

The better default moves the boundary out of your query code and into the database. The names differ; the idea is identical.

Pinecone namespaces physically separate each tenant's data in the serverless backend — reads and writes target exactly one namespace, so "the behavior of one tenant does not affect other tenants." You get up to 10,000 namespaces per index, with million-scale supported. Weaviate puts each tenant on a dedicated shard and is built for absurd tenant counts: the docs cite 50,000+ active shards per node and "1M concurrently active tenants with just 20 or so nodes," plus the ability to offload cold tenants to object storage. Milvus hashes a partition key to scale past its hard cap of 1,024 raw partitions, reaching 10M+ tenants. Even Qdrant, which recommends a single collection, gets you here: tag points with a group_id, build a keyword payload index marked is_tenant=true (the docs call this "crucial for performance," since it co-locates a tenant's data on disk), and filter on it.

The win isn't a feature — it's that a forgotten scope can no longer leak across the wall. A query without a namespace doesn't silently fan out across customers; it's scoped by construction. You've converted a runtime assertion you must never miss into a structural property the database holds for you. This is the same engine that backs the vector databases you're already choosing between — multi-tenancy is a mode, not a different product.

Metadata filtering makes isolation something you must remember. Namespaces make it something you can't forget.

Pattern 3: a collection per tenant — strong, and a trap at scale

The instinct that "real isolation means a separate database per customer" isn't wrong about the isolation — it's wrong about the scale. Spinning up a collection or index per tenant gives you the hardest separation available, and every vendor tells you not to do it past a handful. Qdrant: "It is not recommended to create hundreds and thousands of collections per cluster as it increases resource overhead unsustainably." Pinecone calls separate indexes "highly resource and cost-intensive." Each collection carries its own memory, segments, and index structures, so a thousand of them is a thousand fixed costs. Reserve this for the few enterprise tenants whose contracts demand a dedicated store — not as your default tenancy model.

The filter detail that bites either way

One more thing that survives every pattern: how the filter combines with the search. Post-filtering runs the vector search first and then drops non-matching results, which means a selective tenant filter can return too few rows — or none — because the top-k came back full of other tenants. Pre-filtering restricts candidates to the tenant before searching, which is what you want. Modern engines handle the hard case (a highly selective filter that fragments the HNSW graph into disconnected islands) with strategies like Weaviate's ACORN, now its default. For multi-tenancy the rule is simple: the tenant scope must be a pre-filter or a hard partition, never a post-hoc trim.

Pick the pattern by where you want to be standing the day someone ships a query without the filter. With metadata filtering, that day is an incident. With namespaces, it's a 404. Choose the boundary that fails safe, and let the database — not your discipline — be the thing that holds the wall.