An agent without memory is a brilliant amnesiac: capable in the moment, useless across time. The context window is not memory — it is working memory at best, and an expensive one. Real agent memory means deciding what to persist, how to retrieve it, and crucially when to forget. These nine repositories represent the current state of that art, from the vector stores that hold the embeddings to the memory layers that decide what is worth holding at all.
The Memory Layers
The most direct attack on the problem is a dedicated memory layer that sits between your agent and its model. Mem0 has become the most-starred answer: a universal memory layer that extracts, stores, and recalls salient facts across sessions, so the agent that helped you yesterday recognizes you today. Letta — the project that grew out of the MemGPT research — takes the boldest position, treating memory management as a first-class operating-system concern with the agent paging information in and out of a hierarchical store.
Where Mem0 and Letta think in facts, Cognee and Zep think in structure. Cognee builds a self-hosted knowledge graph so an agent's memories are connected rather than merely retrieved, and Zep layers a temporal knowledge graph that understands how facts change over time — that your address from last year is no longer your address now, a distinction flat vector search cannot make.
The Vector Stores Underneath
Most memory ultimately rests on similarity search, and the vector database you choose shapes everything above it. Chroma is the developer-first default — embed, store, query, with almost no ceremony — and its rewrite into Rust made it genuinely fast. Qdrant occupies the performance tier: a Rust-built engine with rich filtering that scales from a laptop to a cluster without changing your code.
For teams whose data already lives in Postgres, pgvector is the pragmatic answer that avoids a whole new piece of infrastructure: vector similarity search as a native extension, so your agent's memory sits in the same transactional database as the rest of your application. And LanceDB is the sleeper worth knowing — an embedded, multimodal retrieval engine that runs in-process with no server to operate, ideal for agents that need fast local memory without standing up a database at all.
The Outlier
One repository deserves its own category. GPTCache is not memory in the cognitive sense — it is a semantic cache, recognizing that a near-identical question has been asked before and returning the stored answer instead of paying for the model call again. For any agent operating at volume, this is the cheapest memory of all: the memory of what it has already said.
The honest summary is that agent memory is unsolved. Every project here makes a different bet about what matters — facts versus graphs, speed versus structure, recall versus forgetting — and the right answer depends entirely on what your agent is for. Pick the layer that matches your problem, put a real vector store beneath it, and remember that the hardest part is not storing memories but knowing which ones to throw away.


