LLM cost archetype

RAG pipeline — LLM cost calculator & pricing model

Your app searches a knowledge base (documents, database, website) and uses what it finds to answer the question. Under the hood, this requires several LLM calls — not just one.

Does this sound like your app?

Real-world example

An internal company knowledge base. Employee asks 'what's our parental leave policy?' — the app rewrites the query, searches HR documents, reranks results, then generates an answer citing the relevant policy. 3–5 LLM calls per question.

Default cost profile

Calls per request
4
Batch-eligible
no
Avg input tokens
2000
Avg output tokens
500

Assumes 3–5 LLM calls per user request (default 4): typically a query rewrite, one or more retrieval/reranking calls, and a final synthesis call. Each call averages ~2,000 input tokens including retrieved context. Prompt caching is architecturally central — retrieval context and system prompts are re-sent on every call in the chain. Not batch-eligible since responses are typically user-facing and latency-sensitive.

Rough cost

$30–500/mo at 100–1,000 users/day. Prompt caching is critical — enable it.

Red flag

If you're only doing one LLM call per question (no retrieval step), this is probably a Simple chatbot or Chatbot with history.

Model costs for RAG pipeline← All archetypes