How to read the prices
Prices are quoted per 1,000,000 tokens for both input (prompt + context + tool definitions) and output (model's reply). At GPT-4o's $2.50 / $10 input/output, a 5k-token prompt with a 500-token reply costs $0.0125 + $0.005 = $0.0175 per call. Multiply by your call volume to get monthly spend.
Output tokens cost more than input tokens at almost every provider — typically 3–5×. Architectures that produce short structured outputs (JSON mode, tool calls) are usually cheaper to run than ones that ask the model to "explain in detail".
FAQ
- How often is this updated?
- Reviewed 2026-05-10. A scheduled GitHub Actions workflow opens an issue at the start of every month to bump prices. We deliberately don't auto-scrape — that drifts wrong silently.
- Are these list prices or what I'll actually pay?
- List prices. Real bills can be lower (OpenAI batch API = 50%, Anthropic prompt caching = 90% on cache hits) or higher (egress, longer-context surcharges on Gemini). Use the per-call calculator to model your actual workload.
- Why no Bedrock / Vertex / Azure pricing?
- Resellers add their own surcharges and discounts. We cover provider-direct pricing only. For Bedrock / Vertex / Azure, multiply by the platform's factor (typically 1.0–1.1× direct).
- Why aren't embedding models in this table?
- See the embedding dimension reference under /tokens/. They're a different shape (input only, no output cost) so we keep them separate.
Related tools
- LLM Capability Matrix
Which frontier models support multimodal, vision, audio, JSON mode, and tool calling.
- Per-call LLM Cost Calculator
Pick a model, paste your prompt, set expected output length, see the per-call cost.
- Monthly LLM Cost Estimator
Plug in daily request volume + average token sizes, see monthly spend per model side-by-side.
- Rate Limit / TPM Calculator
Sanity-check whether your traffic stays within RPM and TPM ceilings for a given model and tier.