Monthly LLM Cost Estimator

Model	Input cost	Output cost	Monthly total
Gemini 2.0 Flash Google	$60.00	$48.00	$108.00
GPT-4o mini OpenAI	$90.00	$72.00	$162.00
DeepSeek V3 DeepSeek	$162.00	$132.00	$294.00
GPT-3.5 Turbo OpenAI	$300.00	$180.00	$480.00
Claude 3.5 Haiku Anthropic	$480.00	$480.00	$960.00
Gemini 2.0 Pro Google	$750.00	$600.00	$1.4k
GPT-4o OpenAI	$1.5k	$1.2k	$2.7k
Claude 3.7 Sonnet Anthropic	$1.8k	$1.8k	$3.6k
GPT-4 Turbo OpenAI	$6.0k	$3.6k	$9.6k
Claude 3.7 Opus Anthropic	$9.0k	$9.0k	$18.0k

How to use this

Start with a realistic requests per day number — count actual production traffic, not optimistic projections. Set average input tokens to your typical prompt size (system + RAG context + user message). Average output is whatever your responses look like in production.

If you've enabled prompt caching, set the cache hit rate based on your traffic pattern. Repeated system prompts on similar requests = high hit rate (50–90%). Highly varied per-request prompts = low hit rate (5–20%).

FAQ

Why is the cheapest model highlighted but not always the right answer?: Cost ranking ignores quality. DeepSeek V3 might be 10× cheaper than GPT-4o for your workload but produce worse JSON-mode outputs. Eval first, optimise cost second.
How accurate is the prompt-caching discount?: For OpenAI, 50% off cached input is the documented number. For Anthropic, "up to 90%" depends on cache breakpoints and TTL. The default 50% is a conservative middle ground; tune for your provider.
What about embeddings, vector DB, hosting?: Not included. Those are flat-ish costs (per GB stored, per query). Add them separately. For most LLM-heavy workloads, inference dominates the bill.
How do I think about cost growth at 10× volume?: It's linear at list price, sub-linear with caching. If you hit serious volume (~$10k/mo+), negotiate enterprise pricing — providers will discount significantly.

Common pitfalls

Forgetting that output tokens dominate cost in many workloads.
Sizing on optimistic "10× growth" projections rather than measured traffic.
Counting only one model when shipping a router that fans out across several.
Ignoring the long tail — that one user who asks for 30k-token essays.

Monthly LLM Cost Estimator

How to use this

FAQ

Common pitfalls

Related tools