There is a word in the agent-infrastructure vocabulary that quietly lies to you, and it's stateless. It suggests an agent with no memory, a goldfish that forgets you between turns. But a multi-turn agent always has state — the conversation so far, the plan it's three steps into, the results of the tools it already called. That state has to live somewhere. "Stateless vs stateful" is not a question of whether the state exists. It's a question of who stores it, and who pays to replay it.

Get that framing right and most of the architecture decisions downstream answer themselves.

Stateless: the client carries the whole history, every turn

The cleanest example is Anthropic's Messages API. The documentation is blunt about it: "The Messages API is stateless, which means that you always send the full conversational history to the API." There is no server-side thread, no session, no conversation ID. To continue a dialogue, you append the assistant's last turn to your messages array and re-send the entire thing. OpenAI's older Chat Completions endpoint works the same way: each call is independent and remembers nothing unless you shuttle the whole transcript back across the wire.

This sounds primitive. It is actually a feature. Because the model has no memory between calls, you have total authority over what it sees: you can summarize old turns, drop a tool result that bloated the window, inject a synthetic assistant message, or swap providers entirely without migrating any server-side state. The context window is a value you assemble fresh each turn, not a black box someone else mutates. For debugging and for context engineering — the actual craft of agent-building — that transparency is worth a lot.

The cost is arithmetic, and it's unforgiving. If each turn re-sends every prior turn, the cumulative input tokens over an n-turn conversation grow with the square of n. Turn 50 doesn't cost what turn 1 cost; it costs roughly fifty times as much, and you paid that escalating tax on all forty-nine turns before it. Prompt caching softens the slope, but the shape is still O(n²). A stateless agent is simple, portable, and inspectable — and it gets expensive precisely when it gets interesting.

The model has no memory between calls. That's not the bug. That's the only reason you get to decide what it remembers.

Stateful: the provider holds the history, you pass a pointer

The stateful answer moves the transcript off the client. OpenAI's Responses API is the marquee example: send store: true, get back a response ID, and on the next turn pass previous_response_id instead of the history. The platform reconstructs the context server-side. You send a pointer, not the transcript. Responses are retained for 30 days by default, browsable in the dashboard, retrievable by ID — handy, until you remember that "stored on OpenAI's servers for 30 days" is also a data-governance sentence.

Frameworks do the same thing one layer down. LangGraph's persistence checkpoints the graph's state after every node, keyed by a thread_id, into a backend you choose — Postgres, SQLite, memory. The agent can crash, redeploy, or pause for a day waiting on a human approval, then resume from the last checkpoint and reconstruct its state as if nothing happened. That durability — survive the restart, survive the wait — is the real prize of statefulness, more than the saved bandwidth.

Here is the non-obvious part, the one that catches teams who switch to a stateful API expecting a cheaper bill. Server-side state does not make the context free. Even when you chain with previous_response_id, OpenAI still bills every prior input token in the chain as input tokens, as practitioners noted early on. The model re-reads the conversation either way; statefulness just spares you from re-uploading it and re-managing it. You're buying bandwidth and developer ergonomics, not a discount on the tokens the model actually processes. Confuse the two and you'll budget for savings that never arrive.

The axis that actually decides it

Strip away the marketing and there's a single trade, the one laid out in the table above. Holding state on the client buys you control, portability, and inspectability, at the cost of bandwidth and an O(n²) replay bill. Holding it on the server (or in a checkpointer) buys you durability, simpler request payloads, and resume-after-crash, at the cost of lock-in and a context you can no longer freely reach in and edit.

Notice this is a different question from where you run the agent. You can run a stateless agent on a durable runtime, or a stateful one on plain serverless — see where to run a long-running AI agent for that orthogonal axis, and mem0 vs Zep vs Letta for the layer that adds cross-session memory on top of either. Even the protocols are converging on the stateless default for portability's sake: the MCP spec is retreating from server-assigned sessions, for exactly the reasons above.

So don't ask whether your agent should be "stateful." It has state regardless. Ask who you want holding it when the process dies at 3 a.m. mid-task — and whether you'll ever need to reach in and change what it remembers. Answer those two, and you've chosen.