Two things get filed under the same word, and the filing is the problem.

The first is a knowledge base: a corpus of documents, code, or tickets that you embed once, share across every user, and query occasionally to ground an answer. The second is memory: what one agent, acting for one person, did and saw and concluded — the running record it consults to stay coherent from one turn to the next. Both use vector search. They are not the same workload, and treating them as one is why so many agent memory setups feel wrong.

A knowledge base is one big index that many people read. Memory is a million small indexes, each read by exactly one agent, constantly.

Line up the properties and they point in opposite directions. A knowledge base is large, shared, read-mostly, and latency-tolerant — a few hundred milliseconds to fetch grounding is fine. Memory is small, private, write-heavy, and latency-critical — the agent touches it on nearly every turn, and it belongs to one user who would rather it not sit in a shared multi-tenant store at all. The hosted vector database, the thing the whole industry reached for first, is superb at the first profile and structurally bad at the second.

Why the server is the wrong home for memory#

Put per-agent memory in a hosted vector DB — a Qdrant or Milvus or Weaviate cluster, or a dedicated agent-memory server — and you can make it work with namespaces or per-tenant collections. Then scale it. Now one cluster holds millions of tiny indexes, most of them cold most of the time, each carrying per-namespace overhead, and every recall — the hot path, the thing that runs on every turn — pays a network round-trip. You have also collected every user's private episodic memory into a single system with a single breach blast radius. Each of those is a direct consequence of the workload being small, per-user, and hot, which is the exact opposite of what the server was optimized for.

Embedded engines invert all four at once. One database per user, living in-process, means recall is a function call rather than a request. Offline-first means the agent's memory survives a dropped connection. Local means the private data never leaves the device unless you deliberately sync it. The awkward parts of the server design aren't tuned away — they're designed out.

The three claiming this half#

sqlite-vec is the pragmatist's answer. It is a vector-search extension for SQLite — dependency-free C, MIT/Apache licensed, now sponsored by Mozilla — that does fast brute-force nearest-neighbor search over vectors sitting in the same file as your relational data. It runs anywhere SQLite runs: laptop, server, phone, Raspberry Pi, and the browser via WASM. If an agent's memory is thousands of items, not billions, brute force is not a compromise — it is the correct, boring, fast choice, and you get SQL joins against your metadata for free.

ObjectBox comes at it from the embedded-database side: an on-device store with built-in vector search, ACID guarantees, and — the part that matters for memory — out-of-the-box data sync. It is engineered for mobile, IoT, robots, and hardware where CPU, memory, and battery are all scarce. If your agent lives on a device rather than in a datacenter, this is native ground.

Qdrant Edge is the server vendor conceding the point. Announced in private beta on July 30, 2025, it is an in-process, offline-first build of Qdrant that keeps the grown-up retrieval features — hybrid dense-plus-sparse search, multivector/ColBERT scoring, structured filtering — with no background service, plus selective device-to-cloud sync. That last feature is the whole thesis in one setting: decide, per item, what stays private on the device and what graduates to the shared corpus. It is still partner-curated and not generally available, so treat it as a signal of direction more than a thing you can pip install today.

The rule to take away#

Stop asking "which vector database for my agent?" and ask "which half of the workload is this?"

The reason "agent memory" keeps feeling like a solved problem that isn't is that it was quietly handed to the wrong tier. The corpus can stay in the cloud. The memory wants to come home.