The memory library, read in order — from the foundations (what agent memory is, and how it differs from state) through the architecture call (memory or RAG?), where memory lives (filesystem vs vector store, the three places to keep it), the frameworks that manage it (Mem0, Zep, Letta, and the newer drop-ins), operating it (what an agent should forget and consolidate), the evaluation that tells you whether it works (LoCoMo, LongMemEval, BEAM), and the essays on why memory became the hard part.
Most teams buy one vector store and call it 'memory.' It solves exactly one of the four problems — which is why the agent still loses the thread and repeats yesterday's mistake.
Nine repositories tackling the hardest unsolved problem in agent design — remembering, retrieving, and forgetting across the lifetime of a conversation.
Both embed a query and pull matching text into the prompt, so they look like the same trick. The difference is who writes the index — and that single fact moves the hard problem from retrieval to write discipline.
The memory libraries aren't competing on accuracy. They're competing on geography — where the remembering happens relative to your agent's loop. Pick the place, not the benchmark.
The year's quietest architecture shift is agents moving their memory out of vector stores and into plain files. It isn't that memory got better — it's that teams stopped using a retrieval tool for a state problem.
Three popular open-source memory frameworks that look like rivals but are actually three different bets on where memory lives — and how much of your architecture you hand over.
TeleMem ships as a one-line replacement for Mem0 — import telemem as mem0 — and claims a 16-point accuracy edge. Read where that number comes from and you learn exactly which agent it's for.
Every serious agent-memory system is really a forgetting system. The hard part was never storing what the agent learns — it's pruning the contradictions and stale facts that quietly poison retrieval.
Bigger context windows don't fix forgetting. The benchmarks that actually test agent memory — LoCoMo and LongMemEval — and what their question categories reveal about where it breaks.
Mem0 says 92.5% on LoCoMo. Mastra says 95% on LongMemEval. Zep corrected its own 84% to 58%. They can't all be right — and the baseline that beats them all is the one no vendor charts.
The benchmarks that grade an agent's memory just moved the finish line from 9,000 tokens to 10 million — and the new one proves a million-token context window doesn't buy you long-term memory.
The industry has standardized how agents reach out to the world and ignored the harder question of what they keep — and that asymmetry is not an accident.
The hard problem of agent memory was never remembering. It's knowing when a remembered fact has quietly stopped being true.