How LLM billing works
Almost every chat-completion API bills per million tokens, with separate prices for input (everything you send) and output (the model's reply). Output is typically 3–5× more expensive than input. A typical 5k-token prompt + 500-token reply on GPT-4o: $0.0125 input + $0.005 output = $0.0175 per call.
For most production workloads, output cost dominates because users want detailed responses. If you can replace verbose explanations with structured JSON or tool calls, you usually pay less.
FAQ
- Why "expected" output tokens — can't we count the real ones?
- You only know the actual output count after the call. For pre-call budgeting, estimate based on similar past calls.
max_tokenscaps the worst case but doesn't predict the average. - Where do prompt-caching discounts come in?
- OpenAI gives 50% on cache hits beyond a 1024-token threshold; Anthropic offers up to 90% off on cache reads with explicit cache breakpoints. Both apply to input cost only. This calc shows the un-cached price; reduce your real cache-hit input cost by the discount factor.
- What about batch APIs?
- OpenAI's Batch API is 50% off both input and output, with a 24-hour SLA. Halve the totals shown for batch workloads.
- How do tool / function definitions affect cost?
- They count as input tokens. A 5-tool schema is typically 200–800 tokens of overhead per call. Bake them into your input estimate.
Common pitfalls
- Pricing only on input. Output dominates many real workloads.
- Forgetting to count tool definitions, system prompt, and conversation history in the input.
- Comparing prices without comparing capabilities — GPT-3.5 is cheaper than GPT-4o-mini at input but lacks vision and has a smaller context window.
In your code
npm i js-tiktoken import { encodingForModel } from 'js-tiktoken';
const PRICE = { 'gpt-4o': { in: 2.5, out: 10 } };
export function priceCall(text, expectedOutputTokens, model = 'gpt-4o') {
const inTok = encodingForModel(model).encode(text).length;
const p = PRICE[model];
return (inTok / 1e6) * p.in + (expectedOutputTokens / 1e6) * p.out;
} Related tools
- LLM Pricing Comparison
List-price reference for chat-model APIs across major providers, sortable and reviewed monthly.
- LLM Capability Matrix
Which frontier models support multimodal, vision, audio, JSON mode, and tool calling.
- Monthly LLM Cost Estimator
Plug in daily request volume + average token sizes, see monthly spend per model side-by-side.
- Rate Limit / TPM Calculator
Sanity-check whether your traffic stays within RPM and TPM ceilings for a given model and tier.