Multi-step agents are the most expensive thing you can build with an LLM — and the hardest to forecast. A single user task fans out to 5–12 model calls as the agent plans, calls tools, reads results, and decides what to do next. At 8 calls per task on a premium model, that’s a ~13× a simple chatbot — and that’s before two agent-specific cost drivers that don’t exist anywhere else.
The baseline: ~13× a chatbot
Take an agent averaging 8 calls per task, 1,500 input / 600 output tokens per call, 1,000 tasks/day (30,000/month). Naive monthly cost:
Model $/call × 8 calls × 30k tasks Claude Sonnet 4.6 $0.0135 → $3,240/mo GPT-5.4 $0.0127 → $3,060/mo GPT-5.4 mini $0.0038 → $918/mo Gemini 2.5 Flash $0.0019 → $468/mo
Sonnet’s $3,240 is ~13× the same model running a simple chatbot ($252/mo). The cheapest-to-priciest spread is ~7× ($468 vs $3,240) — but with agents, the cheap option is a trap for reasons we’ll get to.
Driver 1: step count is unpredictable — and it’s a multiplier
A chatbot is always one call. An agent is 5, or 8, or 12, depending on how the task unfolds — and step count multiplies the entire bill. Same model, same tokens, just different chattiness:
Claude Sonnet 4.6, 1,000 tasks/day 5 calls/task → $2,025/mo 8 calls/task → $3,240/mo 12 calls/task → $4,860/mo
A “chatty” agent costs 2.4× a lean one for the identical task. This is the number teams miss most: you can’t budget an agent from average tokens alone, because the call count is a variable you only partly control. Capping max steps isn’t just a safety rail — it’s a direct cost control.
Model this yourself
The multi-step agent archetype, prefilled — 8 calls/task, 1,500/600 tokens. Drag calls-per-request from 5 to 12 and watch the monthly cost swing across models.
Open in calculator →Driver 2: cascading retries
For a chatbot, a failed call retries once — a small, bounded cost. For an agent, a failure mid-chain is different: if step 5 of 8 fails, you may have to re-run not just that call but every step after it. One bad tool result can cost you half the chain again.
That’s why the cost engine doesn’t use a flat retry rate for agents — it models cascading retries in the Worst Case column, where failures compound across the steps instead of adding a fixed percentage. On a four-figure base cost, the gap between Realistic and Worst Case for an agent is real money, and it’s the number to stress-test before you ship.
The cheap-model trap
Gemini Flash at $468/mo looks like a steal next to Sonnet at $3,240. Sometimes it is — but agents are the one place where cheaper-per-call can cost more overall. A weaker model reasons worse, so it takes more steps to complete the same task (and retries more when it goes off the rails). If a budget model averages 12 calls where a premium model needs 6, the per-call savings partly evaporate into extra calls. The right comparison isn’t $/call — it’s $/completed-task, and that’s a function of capability, not just price.
How to control agent cost
- Cap max steps. A hard step limit is the single biggest lever — it bounds the multiplier directly.
- Set a retry budget. Max retries per call (and per hour) turns the cascading-failure tail into a known cap instead of an open-ended risk.
- Route by step difficulty. Use a cheap model for mechanical sub-steps and a premium model only for the hard reasoning — a multi-model routing pattern can cut the bill without dropping task quality.
- Cache the system prompt. The agent’s instructions and tool definitions are re-sent on every step — a perfect, high-frequency caching target.
- Optimize on $/completed-task. Pick the cheapest model that finishes tasks in the fewest steps, not the lowest sticker price.
Model your agent — steps per task, token sizes, cascading retries, and the Worst Case exposure — across every model.
Open the agent cost model →