LLM Pricing Comparison

Model	Provider	Context	Max output	Input $/M	Output $/M	Released
Gemini 2.0 Flash	Google AI	1,000,000	8,192	$0.100	$0.400	2025-02-05
GPT-4o mini	OpenAI	128,000	16,384	$0.150	$0.600	2024-07-18
DeepSeek V3	DeepSeek	128,000	8,192	$0.270	$1.10	2025-03-24
GPT-3.5 Turbo	OpenAI	16,385	4,096	$0.500	$1.50	2023-03-01
Claude 3.5 Haiku	Anthropic	200,000	8,192	$0.800	$4.00	2024-11-04
Gemini 2.0 Pro	Google AI	2,000,000	8,192	$1.25	$5.00	2025-02-05
Mistral Large 2	Mistral	128,000	4,096	$2.00	$6.00	2024-07-24
GPT-4o	OpenAI	128,000	16,384	$2.50	$10.00	2024-08-06
Claude 3.7 Sonnet	Anthropic	200,000	8,192	$3.00	$15.00	2025-02-24
GPT-4 Turbo	OpenAI	128,000	4,096	$10.00	$30.00	2024-04-09
Claude 3.7 Opus	Anthropic	200,000	8,192	$15.00	$75.00	2025-08-01
GPT-4	OpenAI	8,192	8,192	$30.00	$60.00	2023-03-14

How to read the prices

Prices are quoted per 1,000,000 tokens for both input (prompt + context + tool definitions) and output (model's reply). At GPT-4o's $2.50 / $10 input/output, a 5k-token prompt with a 500-token reply costs $0.0125 + $0.005 = $0.0175 per call. Multiply by your call volume to get monthly spend.

Output tokens cost more than input tokens at almost every provider — typically 3–5×. Architectures that produce short structured outputs (JSON mode, tool calls) are usually cheaper to run than ones that ask the model to "explain in detail".

FAQ

How often is this updated?: Reviewed 2026-05-10. A scheduled GitHub Actions workflow opens an issue at the start of every month to bump prices. We deliberately don't auto-scrape — that drifts wrong silently.
Are these list prices or what I'll actually pay?: List prices. Real bills can be lower (OpenAI batch API = 50%, Anthropic prompt caching = 90% on cache hits) or higher (egress, longer-context surcharges on Gemini). Use the per-call calculator to model your actual workload.
Why no Bedrock / Vertex / Azure pricing?: Resellers add their own surcharges and discounts. We cover provider-direct pricing only. For Bedrock / Vertex / Azure, multiply by the platform's factor (typically 1.0–1.1× direct).
Why aren't embedding models in this table?: See the embedding dimension reference under /tokens/. They're a different shape (input only, no output cost) so we keep them separate.

LLM Pricing Comparison

How to read the prices

FAQ

Related tools