LLM cost archetype
Document processor — LLM cost calculator & pricing model
Your app processes documents in bulk — summarizing, extracting data, classifying, or translating them. This runs in the background, not in real time. Because it's async, you can use cheaper batch pricing.
Does this sound like your app?
- □Documents are processed automatically, not triggered by a user waiting for a response
- □You process many documents at a time (invoices, contracts, reports, emails)
- □Results don't need to appear instantly — minutes or hours is fine
- □The input is always a document, not a conversational message
Real-world example
A legal tech tool that summarizes contracts overnight. 500 contracts uploaded — each goes through one LLM call to extract key clauses. No user is waiting. Batch API gives 50% off.
Default cost profile
- Calls per request
- 1
- Batch-eligible
- yes
- Avg input tokens
- 8000
- Avg output tokens
- 800
Assumes 1 LLM call per document with large input (~8,000 tokens of document content) and moderate output (~800 tokens of extracted/summarized data). This is the primary batch-eligible archetype — documents can be queued and processed asynchronously at ~50% off input and output pricing. Models with large context windows (200K+) are preferred since documents can exceed the 8K average.
Rough cost
$10–200/mo at 1,000 documents/day. Batch API is your biggest lever — always enable it.
Red flag
If users are waiting in real time for the result, this isn't batch-eligible and the cost model changes significantly.