$0.5078

Cost / run (cached)

$5,078

Monthly cost (cached)

$16,673

Saved / mo by caching

70%

Input that's the N² re-send

$0.5078/run × 10,000 = $5,078/mo with prefix caching. The same 20-step loop with no caching is $21,750/mo — because re-sending a growing context makes raw input scale with the square of the turn count (70% of it is that N² term). Caching is what keeps agent cost near-linear.

How the estimate works

The LLM cost calculator prices one independent request. An agent is not one request — it is a loop of many, and each turn re-sends the entire conversation so far: the fixed prefix (system prompt + tool schemas) plus everything the loop has appended. Turn t therefore reads base + (t−1)·growth tokens, and summed across an N-step run the input is N·base + growth·N(N−1)/2. That second term is quadratic in the step count — double the turns and the raw input bill roughly quadruples. It is the cost that ambushes teams who budgeted from a per-call price, and the reasoning is laid out in why AI agent costs scale quadratically.

Prefix caching is the escape hatch. Because each turn's prefix is byte-identical to the previous turn's, it bills as a cache read at roughly a tenth of the input rate, and only the newly appended slice — about growth tokens — is fresh. That pulls the quadratic term back toward linear: fresh input across the run is only base + (N−1)·growth. The gap between the two numbers this page shows is the entire return on turning caching on for an agentic workload; the mechanics are in prompt caching for AI agents. Caching only helps while the prefix stays stable, which is also the argument against rewriting context mid-run — every edit upstream invalidates the cache below it.

The defaults are illustrative order-of-magnitude figures; every field is editable, including the list prices (a June 2026 snapshot). Sizing the window those turns consume instead? See the context-window budget calculator. Serving the model yourself? The VRAM calculator covers the hardware side.

AI agent run cost calculator

How the estimate works

Sources

Dispatches from the machines, in your inbox