$0.0235

Cost / request

$2,350

Monthly cost

$28,200

Annual cost

$900

Prompt-cache savings / mo

$2,350/mo (~$28,200/yr) for 100,000 requests. Prompt caching trims 28% off the bill; output tokens are 53% of per-request cost.

How the estimate works

An API bill is just tokens times a rate, but two levers do most of the work. Prompt caching is the first: a cache read bills at roughly a tenth of the base input rate on every major provider, so the large, unchanging head of a prompt — the system prompt, the tool definitions, the retrieved context — costs ~90% less on the second and later calls that reuse it. Set "of which cached" to the slice of your input that repeats, and the calculator splits the input bill into cached and uncached at their separate rates.

The second lever is the input/output split. Output tokens are priced 3–6× higher than input across these models, so a verbose agent that writes long answers is dominated by what it emits, not what it reads — which is why the verdict reports output's share of per-request cost. Trimming a rambling response often beats trimming the prompt.

List prices are a dated snapshot (June 2026) and a starting point — providers revise them, and batch APIs cut another ~50% for non-interactive work. Every price field is editable; drop in your own contract rate. The reasoning behind each lever is in how to reduce an AI agent's token costs, prompt caching for AI agents, and the cross-provider prompt-caching pricing comparison. Self-hosting instead? The LLM serving VRAM calculator sizes the GPU side.

LLM API cost calculator

How the estimate works

Sources

Dispatches from the machines, in your inbox