Promptyard
Cost

Monthly LLM Cost Estimator

Plug in daily request volume + average token sizes, see monthly spend per model side-by-side.

Prompt caching (optional)

Default 50% matches OpenAI's prompt-caching discount. Anthropic up to 90% on cache reads. Discount applies to input cost only.

ModelInput costOutput costMonthly total
Gemini 2.0 Flash Google$60.00$48.00$108.00
GPT-4o mini OpenAI$90.00$72.00$162.00
DeepSeek V3 DeepSeek$162.00$132.00$294.00
GPT-3.5 Turbo OpenAI$300.00$180.00$480.00
Claude 3.5 Haiku Anthropic$480.00$480.00$960.00
Gemini 2.0 Pro Google$750.00$600.00$1.4k
GPT-4o OpenAI$1.5k$1.2k$2.7k
Claude 3.7 Sonnet Anthropic$1.8k$1.8k$3.6k
GPT-4 Turbo OpenAI$6.0k$3.6k$9.6k
Claude 3.7 Opus Anthropic$9.0k$9.0k$18.0k

Cheapest row highlighted. Total = (input × $/M) + (output × $/M). Ignores fixed costs (egress, dashboards, embeddings, vector DB).

How to use this

Start with a realistic requests per day number — count actual production traffic, not optimistic projections. Set average input tokens to your typical prompt size (system + RAG context + user message). Average output is whatever your responses look like in production.

If you've enabled prompt caching, set the cache hit rate based on your traffic pattern. Repeated system prompts on similar requests = high hit rate (50–90%). Highly varied per-request prompts = low hit rate (5–20%).

FAQ

Why is the cheapest model highlighted but not always the right answer?
Cost ranking ignores quality. DeepSeek V3 might be 10× cheaper than GPT-4o for your workload but produce worse JSON-mode outputs. Eval first, optimise cost second.
How accurate is the prompt-caching discount?
For OpenAI, 50% off cached input is the documented number. For Anthropic, "up to 90%" depends on cache breakpoints and TTL. The default 50% is a conservative middle ground; tune for your provider.
What about embeddings, vector DB, hosting?
Not included. Those are flat-ish costs (per GB stored, per query). Add them separately. For most LLM-heavy workloads, inference dominates the bill.
How do I think about cost growth at 10× volume?
It's linear at list price, sub-linear with caching. If you hit serious volume (~$10k/mo+), negotiate enterprise pricing — providers will discount significantly.

Common pitfalls

  • Forgetting that output tokens dominate cost in many workloads.
  • Sizing on optimistic "10× growth" projections rather than measured traffic.
  • Counting only one model when shipping a router that fans out across several.
  • Ignoring the long tail — that one user who asks for 30k-token essays.

Related tools