The default mental model of agent memory is a filing cabinet: whatever the agent learns, write it down; when it needs something, search. It is intuitive, it demos beautifully, and it is the exact design that falls apart in production. The failure is not that the cabinet fills up and costs money to store. It is subtler and worse: as the store grows, the agent gets less reliable, because every query now competes against the stale and contradictory versions of the fact it is trying to recall.
Say a user tells your agent in March that they are vegetarian, in June that they've gone fully vegan, and in between rephrases it three ways. A naive store now holds five overlapping claims. A retrieval for "what can this person eat" pulls back some mixture of them, and the model has to adjudicate a contradiction you handed it. Multiply that across months of interaction and the memory doesn't augment the agent — it poisons it.
Which is why the real subject of every production memory system is not storage. It is consolidation: the loop that turns raw history into a compact, non-contradictory set of durable facts. And the interesting thing about the current crop of systems is that they mostly agree on what consolidation has to do — extract atomic facts, merge duplicates, resolve contradictions, drop the dead weight — and disagree almost entirely on who runs that loop and when.
Mem0: resolve the contradiction at write time#
Mem0 puts the loop inline. When a new exchange arrives, it extracts candidate facts, and for each one retrieves the semantically similar memories it already holds. Then — rather than writing if/else rules for merging — it hands the candidate and its neighbors to an LLM and asks it to pick a tool: ADD a genuinely new fact, UPDATE an existing one with more detail, DELETE a memory the new fact contradicts, or NOOP if it's a repeat.
That four-way choice is the whole design in miniature. Forgetting isn't a background sweep or a storage quota; it's the DELETE branch, decided per fact, at the moment the contradiction appears. "I moved to Lisbon" doesn't get filed next to "I live in Berlin" — it evicts it. The store stays small because it stays resolved.
Zep: don't delete the fact, expire it#
Zep, built on its Graphiti engine, agrees that contradictions must be resolved but refuses to throw the old fact away. Its knowledge graph is bi-temporal: every fact carries two clocks — when it was true in the world, and when the system learned it. When a new edge contradicts an existing one, Zep uses an LLM to detect the conflict and then invalidates the old edge, stamping the moment it stopped being true, instead of removing it.
The difference between deleting a fact and expiring it is the difference between an agent that only knows the present and one that can reason about how the present came to be.
This is the sharpest idea in the space. A deleted fact is gone; an expired one is retired. The agent can still answer "where did they live before Lisbon," audit a correction, or reason about a fact that was true for a bounded window — which is exactly the shape of most real-world knowledge. Forgetting here means demoting from the present, not erasing from the record.
Letta: make the agent do it#
Letta, which grew out of the MemGPT work that framed the LLM as an operating system paging memory in and out of a limited context, hands the loop to the agent itself. The agent gets explicit tools to edit its own working memory and its own archive — append a fact, insert into long-term storage, search it back — so what to remember becomes a decision the model makes, not a pipeline it's subjected to.
The twist worth watching is the sleep-time agent: a background process that shares memory with the primary agent and reorganizes it while the user is idle — compacting history, merging facts, pre-deriving conclusions. That is sleep-time compute applied to memory, and it moves consolidation off the critical path entirely. The foreground agent answers fast; the tidying happens between requests, the way biological memory does its housekeeping offline rather than mid-conversation.
Anthropic's memory tool: a directory and a policy#
Anthropic's memory tool, now generally available on the Messages API, is the minimalist end of the spectrum. There is no graph and no extraction pipeline — just a client-side /memories directory the model curates with plain file operations (view, create, str_replace, insert, delete, rename). The model checks the directory before a task and writes back what it learns, and the whole thing is deliberately just files you store however you like.
What makes it a consolidation system rather than a filing cabinet is the surrounding policy. Anthropic's own guidance tells you to cap file sizes and periodically delete memory files that haven't been accessed in a long time — a staleness TTL, stated plainly as a thing you are supposed to implement. Paired with context editing, which prunes old tool results from the live window, the design treats forgetting as a first-class operational concern rather than an emergent accident.
The real lesson#
Line these up — and if you're actually choosing between the memory products, the head-to-head on Mem0, Zep, and Letta is the next stop — and the disagreement about mechanism (LLM tool-call, temporal graph, self-editing agent, curated file tree) matters less than the thing they all quietly assert: an agent's memory is only as good as its forgetting policy. The systems that improve with use are the ones that decided, on purpose, what to drop, when to drop it, and what "dropping" even means. The ones that rot are the ones that only ever learned to write. Store-everything was never the safe default. It was the bug wearing the costume of a feature.



