Agent Memory and State

An agent without memory is a brilliant amnesiac: capable in the moment, useless across time. The context window is not memory — it is working memory at best, and an expensive one. Real agent memory means deciding what to persist, how to retrieve it, and crucially when to forget. These nine repositories represent the current state of that art, from the vector stores that hold the embeddings to the memory layers that decide what is worth holding at all.

The Memory Layers#

The most direct attack on the problem is a dedicated memory layer that sits between your agent and its model. Mem0 has become the most-starred answer: a universal memory layer that extracts, stores, and recalls salient facts across sessions, so the agent that helped you yesterday recognizes you today. Letta — the project that grew out of the MemGPT research — takes the boldest position, treating memory management as a first-class operating-system concern with the agent paging information in and out of a hierarchical store.

▟ mem0ai/mem0

A universal memory layer that extracts and recalls salient facts across sessions, giving agents continuity without stuffing the whole history into context.

★ 58kPythonmem0ai/mem0

▟ letta-ai/letta

The platform born from the MemGPT research — stateful agents that manage their own tiered memory like an OS pages RAM, learning and self-improving over time.

★ 23kPythonletta-ai/letta

Where Mem0 and Letta think in facts, Cognee and Zep think in structure. Cognee builds a self-hosted knowledge graph so an agent's memories are connected rather than merely retrieved, and Zep layers a temporal knowledge graph that understands how facts change over time — that your address from last year is no longer your address now, a distinction flat vector search cannot make.

▟ topoteretes/cognee

An open-source memory platform that builds a knowledge graph from an agent's history, giving persistent, connected recall across sessions.

★ 18kPythontopoteretes/cognee

▟ getzep/zep

A memory service built on a temporal knowledge graph that tracks how facts evolve over time — so an agent knows which version of the truth is current.

★ 5kPythongetzep/zep

The Vector Stores Underneath#

Most memory ultimately rests on similarity search, and the vector database you choose shapes everything above it. Chroma is the developer-first default — embed, store, query, with almost no ceremony — and its rewrite into Rust made it genuinely fast. Qdrant occupies the performance tier: a Rust-built engine with rich filtering that scales from a laptop to a cluster without changing your code.

▟ chroma-core/chroma

A developer-friendly, now Rust-powered vector database that makes embedding-backed memory a three-line affair — the easy default for agent retrieval.

★ 28kRustchroma-core/chroma

▟ qdrant/qdrant

A high-performance, massive-scale vector search engine with powerful payload filtering — the choice when memory needs to be both large and fast.

★ 32kRustqdrant/qdrant

For teams whose data already lives in Postgres, pgvector is the pragmatic answer that avoids a whole new piece of infrastructure: vector similarity search as a native extension, so your agent's memory sits in the same transactional database as the rest of your application. And LanceDB is the sleeper worth knowing — an embedded, multimodal retrieval engine that runs in-process with no server to operate, ideal for agents that need fast local memory without standing up a database at all.

▟ pgvector/pgvector

Vector similarity search as a native Postgres extension — keep agent memory in the same battle-tested database as the rest of your data.

★ 22kCpgvector/pgvector

▟ lancedb/lancedb

An embedded, serverless retrieval library for multimodal data that runs in-process — fast local agent memory with nothing to deploy.

★ 11kRustlancedb/lancedb

The Outlier#

One repository deserves its own category. GPTCache is not memory in the cognitive sense — it is a semantic cache, recognizing that a near-identical question has been asked before and returning the stored answer instead of paying for the model call again. For any agent operating at volume, this is the cheapest memory of all: the memory of what it has already said.

▟ zilliztech/GPTCache

A semantic cache for LLM responses that recognizes paraphrased repeat queries and serves stored answers — the memory that saves money rather than context.

★ 8kPythonzilliztech/GPTCache

The honest summary is that agent memory is unsolved. Every project here makes a different bet about what matters — facts versus graphs, speed versus structure, recall versus forgetting — and the right answer depends entirely on what your agent is for. Pick the layer that matches your problem, put a real vector store beneath it, and remember that the hardest part is not storing memories but knowing which ones to throw away.

The Memory Layers#

The Vector Stores Underneath#

The Outlier#

Indexer

Get the next build guide in your inbox

Agent Memory and State

The Memory Layers#

The Vector Stores Underneath#

The Outlier#

Indexer

Continue reading

How to Add Persistent Memory to Your Agent with Mem0: A Copy-Paste Quickstart

Tool Highlight: Mem0 — Drop-In Persistent Memory for Your AI Agent

How to Wire Context Editing and the Memory Tool Together in the Claude API

Get the next build guide in your inbox