Promptyard
Cost

Per-call LLM Cost Calculator

Pick a model, paste your prompt, set expected output length, see the per-call cost.

Input cost
$0
0 tok × $2.5/M
Output cost
$0.0050
500 tok × $10/M
Total per call
$0.0050

How LLM billing works

Almost every chat-completion API bills per million tokens, with separate prices for input (everything you send) and output (the model's reply). Output is typically 3–5× more expensive than input. A typical 5k-token prompt + 500-token reply on GPT-4o: $0.0125 input + $0.005 output = $0.0175 per call.

For most production workloads, output cost dominates because users want detailed responses. If you can replace verbose explanations with structured JSON or tool calls, you usually pay less.

FAQ

Why "expected" output tokens — can't we count the real ones?
You only know the actual output count after the call. For pre-call budgeting, estimate based on similar past calls. max_tokens caps the worst case but doesn't predict the average.
Where do prompt-caching discounts come in?
OpenAI gives 50% on cache hits beyond a 1024-token threshold; Anthropic offers up to 90% off on cache reads with explicit cache breakpoints. Both apply to input cost only. This calc shows the un-cached price; reduce your real cache-hit input cost by the discount factor.
What about batch APIs?
OpenAI's Batch API is 50% off both input and output, with a 24-hour SLA. Halve the totals shown for batch workloads.
How do tool / function definitions affect cost?
They count as input tokens. A 5-tool schema is typically 200–800 tokens of overhead per call. Bake them into your input estimate.

Common pitfalls

  • Pricing only on input. Output dominates many real workloads.
  • Forgetting to count tool definitions, system prompt, and conversation history in the input.
  • Comparing prices without comparing capabilities — GPT-3.5 is cheaper than GPT-4o-mini at input but lacks vision and has a smaller context window.

In your code

Cost calc snippet js npm →
npm i js-tiktoken
import { encodingForModel } from 'js-tiktoken';

const PRICE = { 'gpt-4o': { in: 2.5, out: 10 } };

export function priceCall(text, expectedOutputTokens, model = 'gpt-4o') {
  const inTok = encodingForModel(model).encode(text).length;
  const p = PRICE[model];
  return (inTok / 1e6) * p.in + (expectedOutputTokens / 1e6) * p.out;
}

Related tools