LLM cost archetype

Coding assistant — LLM cost calculator & pricing model

Your app helps developers write, review, explain, or debug code. Prompts are large (code files, diffs, instructions) and outputs are large (generated code, explanations). Token costs are higher than a typical chatbot.

Does this sound like your app?

□Users paste code or file contents into the prompt
□The AI generates substantial code in response (not just a short answer)
□Input prompts are typically 2,000+ tokens
□Output responses are typically 500–2,000 tokens

Real-world example

A PR review tool. Developer opens a pull request — the app sends the entire diff (3,000 tokens) to an LLM and gets back a detailed code review (1,200 tokens). Output pricing matters a lot here.

Default cost profile

Calls per request: 1.5
Batch-eligible: no
Avg input tokens: 4000
Avg output tokens: 1500

Assumes 1–2 LLM calls per request (default 1.5): sometimes a single generation, sometimes a generate-then-review pattern. Input tokens are high (~4,000) because prompts include code context, file contents, and instructions. Output tokens are also high (~1,500) for generated code. Caching helps with repeated project context. Not batch-eligible — developers expect real-time responses.

Rough cost

$20–300/mo at 100–1,000 requests/day. Output-heavy — choose models with competitive output pricing.

Red flag

If users are asking general programming questions without pasting code, this might be a Simple chatbot.

Model costs for Coding assistant →← All archetypes