LLM cost archetype

Document processor — LLM cost calculator & pricing model

Your app processes documents in bulk — summarizing, extracting data, classifying, or translating them. This runs in the background, not in real time. Because it's async, you can use cheaper batch pricing.

Does this sound like your app?

□Documents are processed automatically, not triggered by a user waiting for a response
□You process many documents at a time (invoices, contracts, reports, emails)
□Results don't need to appear instantly — minutes or hours is fine
□The input is always a document, not a conversational message

Real-world example

A legal tech tool that summarizes contracts overnight. 500 contracts uploaded — each goes through one LLM call to extract key clauses. No user is waiting. Batch API gives 50% off.

Default cost profile

Calls per request: 1
Batch-eligible: yes
Avg input tokens: 8000
Avg output tokens: 800

Assumes 1 LLM call per document with large input (~8,000 tokens of document content) and moderate output (~800 tokens of extracted/summarized data). This is the primary batch-eligible archetype — documents can be queued and processed asynchronously at ~50% off input and output pricing. Models with large context windows (200K+) are preferred since documents can exceed the 8K average.

Rough cost

$10–200/mo at 1,000 documents/day. Batch API is your biggest lever — always enable it.

Red flag

If users are waiting in real time for the result, this isn't batch-eligible and the cost model changes significantly.

Model costs for Document processor →← All archetypes