A long-running agent rarely fails because the model got dumber. It fails because the model can no longer see the thing that matters. Forty tool calls in, the window is a landfill of stale search results, file dumps, and half-finished reasoning, and the one fact the agent needs is buried under 200,000 tokens of its own exhaust. The model is fine. The context is the problem.

This is why "just use a bigger window" is the wrong reflex. A million-token window does not make the agent recall better; past a point it makes recall worse, the failure mode we've covered as context rot. The window is not a bucket you fill — it is an attention budget that decays as it fills. The real question for a long-running agent is not how big the window is but how you make room in it, and Anthropic now ships three server-side levers for exactly that. The mistake almost everyone makes is treating them as three answers to the same question. They aren't. They are a division of labor — and the way to choose is to sort them by what kind of information you can afford to lose.

Three levers, three different losses

Clearing: evict what you can re-fetch

Context editing — strategy clear_tool_uses_20250919 — is the lightest touch. As input tokens cross a threshold (default near 100,000), the API walks the message list and replaces old tool_result blocks with a [cleared] placeholder, keeping the most recent few (default: 3) intact. Crucially, the tool_use call itself stays, so the model retains a record that it fetched something and the body was removed — it isn't a silent hole.

What makes this cheap is the nature of what it throws away: tool results are re-fetchable. In Anthropic's cookbook, clearing dropped seven of eight file reads from a research agent's context; if the agent needs one again, it just calls read_file again. Nothing is destroyed, only evicted. That run held a peak that would have hit 335,279 tokens down to 173,137 — roughly half — without the agent ever noticing a fact had gone missing. The one tax: clearing changes the prompt prefix, so it invalidates your cached tokens. The clear_at_least knob exists to make sure each eviction frees enough to be worth the cache re-write.

Compaction: keep the gist, lose the specifics

Compaction (the API's compact_20260112, default trigger 150,000 tokens; the same idea as Claude Code's /compact) is heavier. Instead of evicting re-fetchable blocks, it summarizes the whole transcript into a condensed <summary> and continues from that. It buys back the most room, and it is the only lever that compresses the agent's own reasoning, not just its tool output.

But summarizing is lossy in a specific, measurable way. In the same cookbook test, a compaction summary preserved 3 of 3 high-level facts — organism names, lifespans, which genetic tools exist — and 0 of 3 obscure specifics, like the values in an appendix table. That is the whole character of compaction in two numbers: it keeps what the work was about and discards the exact figures the work produced. Anthropic's engineers say it plainly — "the art of compaction lies in the selection of what to keep versus what to discard, as overly aggressive compaction can result in the loss of subtle but critical context whose importance only becomes apparent later." And unlike a cleared tool result, a detail dropped from a summary is not re-fetchable. It's gone.

Memory: the only thing that survives a reset

The memory tool (memory_20250818) is different in kind. It doesn't manage the window at all — it lets the model read and write files in a /memories directory that lives outside the window and persists across sessions. The model drives it with six commands (view, create, str_replace, insert, delete, rename), and Anthropic auto-injects a protocol prompt that tells the agent to assume interruption: your context window might be reset at any moment, so anything not written to memory is at risk.

That's the property neither other lever has. Clearing and compaction both produce something that still lives in-window — a placeholder, a summary — and both vanish on a context reset. A note in /memories survives. The cost is that every read and write is a tool round-trip: the model emits a call, your app executes it, a result comes back. Memory is slower than recall-from-context, but it is the only place a fact is safe.

The rule: sort the loss by its cost

Once you see the three this way, the choice stops being "which one" and becomes "which loss goes where."

Assign each kind of loss to the mechanism whose loss is cheapest. Re-fetchable bulk gets cleared. Reasoning you only need the gist of gets compacted. Specifics you must not lose get written to memory — before compaction can summarize them away.

The dangerous move is compaction alone, because it is the one lever that silently discards the exact details — the 0-of-3 — that only later turn out to matter. This is also the cleanest read on Anthropic's headline numbers. On its agentic-search eval, context editing alone delivered a 29% improvement; adding the memory tool pushed it to 39%. The delta isn't a second copy of the same trick. It's the memory tool catching the specifics the editing pass would otherwise have let fall on the floor. In a separate 100-turn web-search run, context editing cut token consumption 84% while letting the agent finish tasks that would otherwise have died of context exhaustion.

What this means for your loop

In practice you don't choose — you stack, in order of escalating cost. Turn on tool-result clearing first and set exclude_tools so it never evicts the memory tool's own output. Let compaction fire at a higher threshold than clearing, so cheap eviction happens before expensive summarization. And give the agent a memory file for the handful of facts and the running task state that have to outlive any reset, so the summary can be lossy without being fatal. When even that isn't enough, the fourth move is architectural: hand the bulky sub-task to a sub-agent with its own clean window and let it return a short result, keeping the orchestrator's context unpolluted.

The mental shift is the same one context engineering keeps asking for, made concrete. A long-running agent's context isn't storage you fill until it's full and then panic. It's a working set you actively curate — and these three levers are just three different answers to the only question that matters at the edge of the window: of everything in here, what can I most afford to lose?