Which archetype is your app?

Select the pattern that best matches how your app uses LLMs. Each archetype has different cost drivers — getting this right makes the estimate meaningful.

Simple chatbot

Your app takes a user message, sends it to an LLM with some instructions, and returns a response. No memory between sessions. Every conversation starts fresh.

Does this sound like your app?

□Users ask one-off questions and get answers
□Each conversation is independent — no history carried over
□Your system prompt is fixed and doesn't change per user
□Response time matters — users are waiting in real time

Real-world example

A customer support widget on an e-commerce site. User types 'where is my order?' — the LLM responds using a fixed system prompt about the company's policies. Every session is identical in structure.

Rough cost: $1–30/mo at 100–1,000 users/day depending on model.

Red flag:

If you're including the last 5 messages in every prompt, this is actually a Chatbot with history.

Model costs for this archetype →Read more

Chatbot with history

Like a simple chatbot, but you include previous messages in every new prompt so the LLM 'remembers' the conversation. The more turns, the bigger (and more expensive) each call gets.

Does this sound like your app?

□Users have multi-turn conversations — the AI refers back to earlier messages
□You pass conversation history into every API call
□Sessions can last 10+ messages
□Users expect the AI to remember what they said earlier in the same chat

Real-world example

An AI sales assistant that qualifies leads over a multi-turn conversation. By turn 8, the prompt includes the system prompt + 7 prior exchanges — easily 4,000–6,000 tokens per call.

Rough cost: $10–150/mo at 100–1,000 users/day. Grows fast with session length.

Red flag:

If conversations are always 1–2 turns, use Simple chatbot instead.

Model costs for this archetype →Read more

RAG pipeline

Your app searches a knowledge base (documents, database, website) and uses what it finds to answer the question. Under the hood, this requires several LLM calls — not just one.

Does this sound like your app?

□Your app searches documents, a knowledge base, or a database before answering
□You use a vector database (Pinecone, Weaviate, pgvector, etc.)
□Each user question triggers multiple LLM calls behind the scenes
□Answers are grounded in specific source documents

Real-world example

An internal company knowledge base. Employee asks 'what's our parental leave policy?' — the app rewrites the query, searches HR documents, reranks results, then generates an answer citing the relevant policy. 3–5 LLM calls per question.

Rough cost: $30–500/mo at 100–1,000 users/day. Prompt caching is critical — enable it.

Red flag:

If you're only doing one LLM call per question (no retrieval step), this is probably a Simple chatbot or Chatbot with history.

Model costs for this archetype →Read more

Multi-model router

Your app uses a routing layer to decide which LLM to call based on the complexity or type of each request. Simple questions go to a cheap, fast model. Complex or sensitive tasks escalate to a frontier model. The cost depends almost entirely on how accurately the router classifies tasks.

Does this sound like your app?

□You use more than one LLM, and different requests go to different models
□You have a 'cheap path' (GPT-5.4 mini, Haiku, Gemini Flash) and an 'expensive path' (GPT-5.4, Claude Sonnet, Gemini Pro)
□A classifier, rules engine, or small LLM decides which model to use
□Routing accuracy directly affects your cost — misrouting is a budget risk

Real-world example

A B2B support platform. Simple queries like 'reset my password' route to GPT-5.4 mini (~0.07¢). Complex queries like 'explain why my enterprise integration is failing' escalate to GPT-5.4 (~0.7¢). A 10% misroute rate to the expensive model can 5× the expected cost.

Rough cost: $15–400/mo at 100–1,000 requests/day. Highly sensitive to model mix ratio.

Red flag:

If every request always goes to the same model regardless of complexity, this is not a multi-model router — use a simpler archetype.

Model costs for this archetype →Read more

Coding assistant

Your app helps developers write, review, explain, or debug code. Prompts are large (code files, diffs, instructions) and outputs are large (generated code, explanations). Token costs are higher than a typical chatbot.

Does this sound like your app?

□Users paste code or file contents into the prompt
□The AI generates substantial code in response (not just a short answer)
□Input prompts are typically 2,000+ tokens
□Output responses are typically 500–2,000 tokens

Real-world example

A PR review tool. Developer opens a pull request — the app sends the entire diff (3,000 tokens) to an LLM and gets back a detailed code review (1,200 tokens). Output pricing matters a lot here.

Rough cost: $20–300/mo at 100–1,000 requests/day. Output-heavy — choose models with competitive output pricing.

Red flag:

If users are asking general programming questions without pasting code, this might be a Simple chatbot.

Model costs for this archetype →Read more

Document processor

Your app processes documents in bulk — summarizing, extracting data, classifying, or translating them. This runs in the background, not in real time. Because it's async, you can use cheaper batch pricing.

Does this sound like your app?

□Documents are processed automatically, not triggered by a user waiting for a response
□You process many documents at a time (invoices, contracts, reports, emails)
□Results don't need to appear instantly — minutes or hours is fine
□The input is always a document, not a conversational message

Real-world example

A legal tech tool that summarizes contracts overnight. 500 contracts uploaded — each goes through one LLM call to extract key clauses. No user is waiting. Batch API gives 50% off.

Rough cost: $10–200/mo at 1,000 documents/day. Batch API is your biggest lever — always enable it.

Red flag:

If users are waiting in real time for the result, this isn't batch-eligible and the cost model changes significantly.

Model costs for this archetype →Read more

Multi-step agent

Your app gives an AI a goal and lets it figure out the steps itself — using tools, making decisions, and iterating until it's done. Each step is a separate LLM call. A task that takes 10 steps costs 10× a single-call task.

Does this sound like your app?

□The AI decides what to do next based on previous results
□Your app uses tools — web search, code execution, API calls, file operations
□A single user task triggers 5 or more LLM calls
□The number of steps is unpredictable — some tasks take 3 steps, others take 15

Real-world example

An autonomous research agent. User says 'find me the 5 best competitors to my SaaS and summarize their pricing.' The agent searches the web, visits each site, extracts pricing, compares, and writes a summary — 8–12 LLM calls per task.

Rough cost: $50–1,000+/mo at 100–1,000 tasks/day. Call count is everything — reducing agent steps saves more than switching models.

Red flag:

If the number of LLM calls per task is always exactly 1–2 and predetermined, this is probably a simpler archetype.

Model costs for this archetype →Read more

Go to cost modeler →