Three Places to Keep an Agent's Memory

If you've shopped for agent memory, you've seen the benchmark fight: a leaderboard of LongMemEval scores, library X beating library Y by some points on recall over a synthetic conversation. It's the wrong axis. The accuracy numbers move every quarter and converge anyway. The decision that actually sticks is architectural, and it's a question of place: where does the remembering live relative to the loop that does the thinking?

There are three answers shipping right now, and they're not really substitutes. They're different shapes.

Memory as a library you import

▟ mem0ai/mem0

A "universal memory layer" you add to an existing agent — extracts facts from turns and stores them across a vector index, a graph, and a key-value store, exposed as add/search calls in Python or Node

★ 59kPythonmem0ai/mem0

This is the least invasive shape. Your agent is already built; you bolt memory on as a dependency. After each exchange you call add(), before each one you search(), and a hybrid store decides what was worth keeping. The appeal is that nothing about your control flow changes — memory is a side table you write to and read from. The cost is that you own the discipline. Mem0 will store what you hand it, but deciding when to write, what to retrieve, and how much to inject back into the prompt is your loop's job, not the library's. It's the right shape when you have an agent you like and a forgetting problem you want to patch without a rewrite.

Memory as a runtime your agent lives in

▟ letta-ai/letta

The framework formerly known as MemGPT — a server where agents are stateful by default, with a small always-in-context "core memory" the model can edit and a larger archival store it pages in and out itself

★ 23.4kPythonletta-ai/letta

Letta inverts the relationship. You don't add memory to your agent; you run your agent inside a system where memory management is a first-class primitive. The model gets a small, always-visible core memory — think of it as RAM the agent can rewrite about itself and its user — plus an archival tier it retrieves from deliberately, with tool calls it issues itself. The lineage matters: this is the production descendant of the MemGPT paper that argued an LLM should manage its own context the way an OS manages physical memory. The trade is ownership of the runtime. You adopt Letta's agent model, not just its storage. That's a real commitment, and it's the right one for long-running autonomous agents that need to curate their own state over weeks rather than have a host application do it for them.

A library lets your agent have a memory. A runtime makes your agent be the thing that remembers.

Memory as a graph that knows what time it is

▟ getzep/graphiti

A temporal knowledge-graph engine: instead of overwriting facts, it records that a fact was true over an interval, so the agent can reason about how things changed and what was believed when

★ 27.6kPythongetzep/graphiti

▟ topoteretes/cognee

An open-source memory platform that ingests mixed data and builds a self-hosted knowledge graph the agent queries, aimed at persistent recall across sessions without a managed backend

★ 18.1kPythontopoteretes/cognee

The third shape treats memory as structure rather than storage. The problem both of these attack is the one vector search quietly fails at: contradiction over time. A user's job title, a project's status, a price — these don't have one true value, they have a history. A flat embedding store retrieves the most similar chunk and has no idea it's stale. Graphiti's bet is that you model facts as edges with valid-from and valid-to timestamps, so "where does she work" returns the right answer for the date you're asking about. Cognee generalizes the same instinct into an ingest-and-graph pipeline you self-host. This shape costs the most up front — you're maintaining a graph, not appending to an index — and it pays off precisely when your domain is full of facts that expire.

How to actually choose

Don't start from the leaderboard. Start from your loop:

You have a working agent and want it to stop repeating itself → the library shape (mem0). Smallest diff, you keep control.
You're building a long-lived autonomous agent that should manage its own state → the runtime shape (Letta). Adopt the model, get self-editing memory for free.
Your facts change and "when was this true" is a real question → the graph shape (Graphiti, Cognee). Pay the structure tax, get temporal correctness.

And if you'd rather understand the machinery before you adopt any of it, read the field first:

▟ NirDiamant/Agent_Memory_Techniques

Thirty runnable notebooks walking conversation buffers, vector stores, knowledge graphs, episodic vs. semantic memory, and the MemGPT/Mem0/Zep patterns — the concepts under all the products

★ 538Jupyter NotebookNirDiamant/Agent_Memory_Techniques

The libraries will keep trading benchmark points. The shapes won't move. Pick where memory should live, and the leaderboard stops mattering.

Three Places to Keep an Agent's Memory

Memory as a library you import

Memory as a runtime your agent lives in

Memory as a graph that knows what time it is

How to actually choose

Dex Mareno

Dispatches from the machines, in your inbox

Three Places to Keep an Agent's Memory

Memory as a library you import

Memory as a runtime your agent lives in

Memory as a graph that knows what time it is

How to actually choose

Dex Mareno

Continue reading

Your Container Is Not A Sandbox

The Trace Is the New Log

The Evals Are the Product

Dispatches from the machines, in your inbox