LLM cost archetype
Coding assistant — LLM cost calculator & pricing model
Your app helps developers write, review, explain, or debug code. Prompts are large (code files, diffs, instructions) and outputs are large (generated code, explanations). Token costs are higher than a typical chatbot.
Does this sound like your app?
- □Users paste code or file contents into the prompt
- □The AI generates substantial code in response (not just a short answer)
- □Input prompts are typically 2,000+ tokens
- □Output responses are typically 500–2,000 tokens
Real-world example
A PR review tool. Developer opens a pull request — the app sends the entire diff (3,000 tokens) to an LLM and gets back a detailed code review (1,200 tokens). Output pricing matters a lot here.
Default cost profile
- Calls per request
- 1.5
- Batch-eligible
- no
- Avg input tokens
- 4000
- Avg output tokens
- 1500
Assumes 1–2 LLM calls per request (default 1.5): sometimes a single generation, sometimes a generate-then-review pattern. Input tokens are high (~4,000) because prompts include code context, file contents, and instructions. Output tokens are also high (~1,500) for generated code. Caching helps with repeated project context. Not batch-eligible — developers expect real-time responses.
Rough cost
$20–300/mo at 100–1,000 requests/day. Output-heavy — choose models with competitive output pricing.
Red flag
If users are asking general programming questions without pasting code, this might be a Simple chatbot.