Per-call LLM Cost Calculator

How LLM billing works

Almost every chat-completion API bills per million tokens, with separate prices for input (everything you send) and output (the model's reply). Output is typically 3–5× more expensive than input. A typical 5k-token prompt + 500-token reply on GPT-4o: $0.0125 input + $0.005 output = $0.0175 per call.

For most production workloads, output cost dominates because users want detailed responses. If you can replace verbose explanations with structured JSON or tool calls, you usually pay less.

FAQ

Why "expected" output tokens — can't we count the real ones?: You only know the actual output count after the call. For pre-call budgeting, estimate based on similar past calls. max_tokens caps the worst case but doesn't predict the average.
Where do prompt-caching discounts come in?: OpenAI gives 50% on cache hits beyond a 1024-token threshold; Anthropic offers up to 90% off on cache reads with explicit cache breakpoints. Both apply to input cost only. This calc shows the un-cached price; reduce your real cache-hit input cost by the discount factor.
What about batch APIs?: OpenAI's Batch API is 50% off both input and output, with a 24-hour SLA. Halve the totals shown for batch workloads.
How do tool / function definitions affect cost?: They count as input tokens. A 5-tool schema is typically 200–800 tokens of overhead per call. Bake them into your input estimate.

Common pitfalls

Pricing only on input. Output dominates many real workloads.
Forgetting to count tool definitions, system prompt, and conversation history in the input.
Comparing prices without comparing capabilities — GPT-3.5 is cheaper than GPT-4o-mini at input but lacks vision and has a smaller context window.

In your code

Cost calc snippet js npm →

npm i js-tiktoken

import { encodingForModel } from 'js-tiktoken';

const PRICE = { 'gpt-4o': { in: 2.5, out: 10 } };

export function priceCall(text, expectedOutputTokens, model = 'gpt-4o') {
  const inTok = encodingForModel(model).encode(text).length;
  const p = PRICE[model];
  return (inTok / 1e6) * p.in + (expectedOutputTokens / 1e6) * p.out;
}

Per-call LLM Cost Calculator

How LLM billing works

FAQ

Common pitfalls

In your code

Related tools