The Real Cost of AI Agents (Multi-Step)

Multi-step agents are the most expensive thing you can build with an LLM — and the hardest to forecast. A single user task fans out to 5–12 model calls as the agent plans, calls tools, reads results, and decides what to do next. At 8 calls per task on a premium model, that’s a ~13× a simple chatbot — and that’s before two agent-specific cost drivers that don’t exist anywhere else.

The baseline: ~13× a chatbot

Take an agent averaging 8 calls per task, 1,500 input / 600 output tokens per call, 1,000 tasks/day (30,000/month). Naive monthly cost:

Model               $/call    × 8 calls × 30k tasks
Claude Sonnet 4.6   $0.0135   →  $3,240/mo
GPT-5.4             $0.0127   →  $3,060/mo
GPT-5.4 mini        $0.0038   →  $918/mo
Gemini 2.5 Flash    $0.0019   →  $468/mo

Sonnet’s $3,240 is ~13× the same model running a simple chatbot ($252/mo). The cheapest-to-priciest spread is ~7× ($468 vs $3,240) — but with agents, the cheap option is a trap for reasons we’ll get to.

Driver 1: step count is unpredictable — and it’s a multiplier

A chatbot is always one call. An agent is 5, or 8, or 12, depending on how the task unfolds — and step count multiplies the entire bill. Same model, same tokens, just different chattiness:

Claude Sonnet 4.6, 1,000 tasks/day
   5 calls/task  →  $2,025/mo
   8 calls/task  →  $3,240/mo
  12 calls/task  →  $4,860/mo

A “chatty” agent costs 2.4× a lean one for the identical task. This is the number teams miss most: you can’t budget an agent from average tokens alone, because the call count is a variable you only partly control. Capping max steps isn’t just a safety rail — it’s a direct cost control.

Model this yourself

The multi-step agent archetype, prefilled — 8 calls/task, 1,500/600 tokens. Drag calls-per-request from 5 to 12 and watch the monthly cost swing across models.

Open in calculator →

Driver 2: cascading retries

For a chatbot, a failed call retries once — a small, bounded cost. For an agent, a failure mid-chain is different: if step 5 of 8 fails, you may have to re-run not just that call but every step after it. One bad tool result can cost you half the chain again.

That’s why the cost engine doesn’t use a flat retry rate for agents — it models cascading retries in the Worst Case column, where failures compound across the steps instead of adding a fixed percentage. On a four-figure base cost, the gap between Realistic and Worst Case for an agent is real money, and it’s the number to stress-test before you ship.

Two agents with the same average token counts can have wildly different bills — one runs lean and rarely retries, the other wanders and re-runs chains. The Worst Case column exists precisely to show you the bad-day exposure agents carry that chatbots don’t.

The cheap-model trap

Gemini Flash at $468/mo looks like a steal next to Sonnet at $3,240. Sometimes it is — but agents are the one place where cheaper-per-call can cost more overall. A weaker model reasons worse, so it takes more steps to complete the same task (and retries more when it goes off the rails). If a budget model averages 12 calls where a premium model needs 6, the per-call savings partly evaporate into extra calls. The right comparison isn’t $/call — it’s $/completed-task, and that’s a function of capability, not just price.

How to control agent cost

Cap max steps. A hard step limit is the single biggest lever — it bounds the multiplier directly.
Set a retry budget. Max retries per call (and per hour) turns the cascading-failure tail into a known cap instead of an open-ended risk.
Route by step difficulty. Use a cheap model for mechanical sub-steps and a premium model only for the hard reasoning — a multi-model routing pattern can cut the bill without dropping task quality.
Cache the system prompt. The agent’s instructions and tool definitions are re-sent on every step — a perfect, high-frequency caching target.
Optimize on $/completed-task. Pick the cheapest model that finishes tasks in the fewest steps, not the lowest sticker price.

Model your agent — steps per task, token sizes, cascading retries, and the Worst Case exposure — across every model.

Open the agent cost model →