The Cheaper LLM Isn't Always Cheaper

You compare two models, pick the cheaper one, and ship. But “cheaper” was a snapshot — it was true at one retry rate, one cache hit rate, one set of assumptions. Nudge any of those and the ranking can flip. A model you chose because it was 20% cheaper can quietly become the more expensive option on a bad-traffic week, and your dashboard won’t tell you why. That fragility is what the new sensitivity tripwire is built to expose.

Why the ranking flips: retries bill uncached tokens

Prompt caching is usually the biggest lever on an LLM bill — cached input reads run ~90% cheaper. So a model with strong cache economics can win a comparison decisively at a low retry rate. Here’s the catch: retries re-bill raw, uncached tokens. A failed-and-retried call doesn’t get the cache discount — it pays full input price again.

So the model that won because of caching is exactly the model whose advantage erodes fastest as failures climb. Every point of retry rate hands back some of its cache savings. Push the retry rate high enough and its cost curve crosses the other model’s — the break-even. Below it, Model A is cheaper; above it, Model B. Your “20% cheaper” decision was only ever true on one side of a line you couldn’t see.

The point estimate isn’t wrong — it’s just fragile. A cost ranking that holds at 5% retries but flips at 12% is a very different thing to build on than one that holds across the whole range. You want to know which you’ve got.

What the tripwire shows

Pick any two models and the tripwire computes the retry-rate break-even for your scenario — the retry rate at which the cheaper model stops being cheaper. The headline reads like:

Gemini 2.5 Flash saves money until retry rate passes ~11%.

If the two curves never cross in range, it says so instead of inventing a number — “stays cheaper across every retry rate — no break-even in range” — so you know when one model simply dominates and when your choice is balanced on a knife edge. Two models with identical pricing (the same model on two providers) tie everywhere and report nothing, which is the honest answer.

Model this yourself

Open the RAG pipeline archetype and scroll to the tripwire under the cost table. Pick two models and read the break-even — the headline number is free.

Open in calculator →

Beyond retries: the full sweep

Retry rate is one lever. Your cost ranking is also sensitive to cache hit rate, to where input prices land, and to how much you scale. The full sweep (Pro) runs the same break-even analysis across all of them — cache hit rate (0–80%), input price (0.5×–2× current list), and call volume — for your chosen pair, and reports every crossover. It’s computed server-side against the real cost engine, so the curves reflect the same math as the main table.

The practical use: before you commit to a model, check how close you are to a break-even on the parameter you’re least sure about. If your retry rate estimate is shaky and the break-even sits at 9%, that’s a decision worth revisiting. If the nearest crossover is 40% out, your choice is robust — ship it.

Why this exists

Every token calculator gives you a point estimate. None of them tell you how much to trust it. A cost model that says $237/mo is only useful if you also know whether that ranking survives a bad week. The tripwire turns a single fragile number into a statement about robustness — which is the difference between pricing and modeling cost.

Model your architecture, pick two models, and see where their costs cross. Free headline break-even; full multi-parameter sweep on Pro.

Try the sensitivity tripwire →