Promptyard
Cost

LLM Pricing Comparison

List-price reference for chat-model APIs across major providers, sortable and reviewed monthly.

Model Provider Context Max output Input $/M Output $/M Released
Gemini 2.0 Flash Google AI 1,000,000 8,192 $0.100 $0.400 2025-02-05
GPT-4o mini OpenAI 128,000 16,384 $0.150 $0.600 2024-07-18
DeepSeek V3 DeepSeek 128,000 8,192 $0.270 $1.10 2025-03-24
GPT-3.5 Turbo OpenAI 16,385 4,096 $0.500 $1.50 2023-03-01
Claude 3.5 Haiku Anthropic 200,000 8,192 $0.800 $4.00 2024-11-04
Gemini 2.0 Pro Google AI 2,000,000 8,192 $1.25 $5.00 2025-02-05
Mistral Large 2 Mistral 128,000 4,096 $2.00 $6.00 2024-07-24
GPT-4o OpenAI 128,000 16,384 $2.50 $10.00 2024-08-06
Claude 3.7 Sonnet Anthropic 200,000 8,192 $3.00 $15.00 2025-02-24
GPT-4 Turbo OpenAI 128,000 4,096 $10.00 $30.00 2024-04-09
Claude 3.7 Opus Anthropic 200,000 8,192 $15.00 $75.00 2025-08-01
GPT-4 OpenAI 8,192 8,192 $30.00 $60.00 2023-03-14

Sorted ascending by input price. Last reviewed 2026-05-10. Verify with the provider before betting infrastructure decisions on these numbers.

How to read the prices

Prices are quoted per 1,000,000 tokens for both input (prompt + context + tool definitions) and output (model's reply). At GPT-4o's $2.50 / $10 input/output, a 5k-token prompt with a 500-token reply costs $0.0125 + $0.005 = $0.0175 per call. Multiply by your call volume to get monthly spend.

Output tokens cost more than input tokens at almost every provider — typically 3–5×. Architectures that produce short structured outputs (JSON mode, tool calls) are usually cheaper to run than ones that ask the model to "explain in detail".

FAQ

How often is this updated?
Reviewed 2026-05-10. A scheduled GitHub Actions workflow opens an issue at the start of every month to bump prices. We deliberately don't auto-scrape — that drifts wrong silently.
Are these list prices or what I'll actually pay?
List prices. Real bills can be lower (OpenAI batch API = 50%, Anthropic prompt caching = 90% on cache hits) or higher (egress, longer-context surcharges on Gemini). Use the per-call calculator to model your actual workload.
Why no Bedrock / Vertex / Azure pricing?
Resellers add their own surcharges and discounts. We cover provider-direct pricing only. For Bedrock / Vertex / Azure, multiply by the platform's factor (typically 1.0–1.1× direct).
Why aren't embedding models in this table?
See the embedding dimension reference under /tokens/. They're a different shape (input only, no output cost) so we keep them separate.

Related tools