LLM cost archetype

Multi-model router — LLM cost calculator & pricing model

Your app uses a routing layer to decide which LLM to call based on the complexity or type of each request. Simple questions go to a cheap, fast model. Complex or sensitive tasks escalate to a frontier model. The cost depends almost entirely on how accurately the router classifies tasks.

Does this sound like your app?

□You use more than one LLM, and different requests go to different models
□You have a 'cheap path' (GPT-5.4 mini, Haiku, Gemini Flash) and an 'expensive path' (GPT-5.4, Claude Sonnet, Gemini Pro)
□A classifier, rules engine, or small LLM decides which model to use
□Routing accuracy directly affects your cost — misrouting is a budget risk

Real-world example

A B2B support platform. Simple queries like 'reset my password' route to GPT-5.4 mini (~0.07¢). Complex queries like 'explain why my enterprise integration is failing' escalate to GPT-5.4 (~0.7¢). A 10% misroute rate to the expensive model can 5× the expected cost.

Default cost profile

Calls per request: 3
Batch-eligible: no
Avg input tokens: 1000
Avg output tokens: 400

A routing layer that sends simple tasks to cheap models and complex tasks to frontier models. Cost depends heavily on your model mix ratio. Assumes 2–4 LLM calls per request (default 3): a classification call to determine complexity, then one or more model calls routed accordingly. Input tokens are moderate (~1,000) since routing prompts are concise. Prompt caching provides standard savings on repeated routing prompts. Not batch-eligible — routing decisions are real-time.

Rough cost

$15–400/mo at 100–1,000 requests/day. Highly sensitive to model mix ratio.

Red flag

If every request always goes to the same model regardless of complexity, this is not a multi-model router — use a simpler archetype.

Model costs for Multi-model router →← All archetypes