There is a comfortable assumption hiding inside most agent demos: that deploying the thing is the easy part. You containerize the loop, point a load balancer at three replicas, and call it production. This works right up until the agent does the one thing a web request never does — it keeps running.
A web request is a sprint. It arrives, it computes, it answers, it forgets. An agent is a relay race that can pause for a human approval, resume a day later, crash halfway through step 37 of 50, and absolutely must not run up the bill twice. The deployment problem isn't "where does the code run." It's that the unit you are actually deploying is not the container — it's the run, and the run has a lifespan measured in hours or days, not milliseconds.
The thing that bites first in production isn't latency. It's that you shipped a new version while runs were still in flight.
Stop holding state in process memory#
The first deploy failure is the most boring one: the agent kept its conversation, scratchpad, and intermediate results in a local variable, and then the process restarted. Every in-flight run vanished.
The fix is to make state explicit, durable, and decoupled from the chat history — Google's own framing in its Agent Development Kit guidance. In practice that means persisting checkpoints to a store and keying them by a thread or session id, so the run is a row someone can pick back up — not a stack frame that dies with the worker. LangGraph makes this the hard requirement for durable execution: you don't get crash recovery unless you "specify a checkpointer that will save workflow progress" and a thread identifier to address it.
There's a subtler reason to externalize state, and it's specific to agents. When a paused agent resumes without durable state, the model doesn't just lose context — it confabulates. Google's ADK team documents the exact pathology: on resume "the model frequently hallucinates intermediate steps — it 'remembers' approvals that weren't given or skips steps it assumes were completed." A stateless web app that forgets returns an error. An agent that forgets invents a plausible past and acts on it. (For the day-to-day version of this problem, see managing context in a long-running agent.)
Choose a runtime that owns the session, not just the request#
Once state is external, "where does it run" becomes a question about session lifetime, not CPU. The managed runtimes have quietly converged on long-lived, isolated sessions:
- AWS Bedrock AgentCore gives every session its own dedicated microVM with isolated CPU, memory, and filesystem, runs it for up to 8 hours, and sanitizes the memory when the session ends.
- Google Cloud's Agent Runtime now supports long-running agents that maintain state for up to seven days, with checkpoint-and-resume and human-approval pauses that "consume zero compute resources" while waiting.
Seven days. Whatever you build on, that number should reframe your mental model: "the process stays up for the whole run" is no longer a safe assumption, so durability can't be something you bolt on at the end. If you'd rather own the layer, durable-execution engines like Temporal give the same guarantee a different way — record every step, and if a worker dies, another replays the history and resumes "exactly where it left off" rather than re-running completed work. A 50-step research task that survives a server restart isn't a nice-to-have; it's the baseline an agent runtime has to clear.
Make the rolling deploy survivable — this is the part everyone forgets#
Here is the failure that no amount of horizontal scaling prevents, and the real reason agents are hard to ship: you deploy v2 while v1's runs are still going.
A checkpoint is a serialized snapshot of v1's graph — its node names, its state schema, its assumptions. Change the graph and redeploy, and the state written by one version may simply not be readable by another. This isn't hypothetical hand-wringing; it's an open, acknowledged gap in LangGraph.js, where "changes to state structure can cause older persisted states to become incompatible with newer versions… leading to failures when resuming workflows from checkpoints." Round-robin a long-lived session across mixed versions and you get silent checkpoint corruption.
The discipline that prevents it borrows straight from stateful-service deployment:
- Version-stamp every run when it starts, and pin it to the code version that created it.
- Blue-green, not rolling, for the agent tier — traffic switches atomically so no session is ever served by two versions.
- Keep the old version live until its in-flight runs drain. You are not done deploying when the new pods are healthy; you're done when the last v1 run finishes or is migrated.
If you must change the state schema under a running fleet, treat it like a database migration: version the checkpoint format and write a forward-migration, or quarantine old runs to finish on old code. The mistake is assuming an agent deploy is atomic. It isn't — it's a window, and the window is as long as your longest run.
Bound the loop and make every side effect idempotent#
Two safeguards turn the remaining failure modes from incidents into shrugs. First, bound the run: a per-call timeout doesn't cap a loop that can call the model fifty times, so set a max step count and a wall-clock budget too (why a per-call timeout isn't enough). Second, make tool calls idempotent. Durable execution means resume-after-crash, and resume means a tool might run twice — so the call that sends an email or charges a card needs an idempotency key, or your reliability feature becomes a double-charge bug. Pair that with tool errors that return the failure instead of crashing the run, and the agent degrades instead of dying.
None of this is exotic infrastructure. It's the same stateful-systems engineering web shops learned twenty years ago — externalized state, version-aware rollout, idempotent writes, bounded work — applied to a process that happens to think out loud. The trap is the demo's framing: that deploying an agent is a packaging problem. It's a lifecycle problem. Get the run's lifecycle right and the container is the boring part again — which is exactly where you want it.
When you're ready to swap the model under the same harness, the discipline is the same as rolling out any new dependency: do it shadow, then canary, then A/B, never all at once.



