Pick the cheapest model that has every flag you need
For most production workloads the right question isn't "which model is best" but "which model is cheapest that supports the flags I need". If you need vision + tool calling + JSON mode, GPT-4o-mini ($0.15 / $0.60) gives you all three at a fraction of GPT-4o's price.
Eval the cheapest qualifying model on your task before reaching for a flagship. The capability matrix narrows the list; an eval picks the winner.
FAQ
- What does "JSON mode" mean exactly?
- A guarantee that the model returns syntactically valid JSON. Doesn't guarantee it conforms to your schema — that's what tool calling + JSON Schema gives you. OpenAI calls it
response_format: { type: "json_object" }; Anthropic surfaces it via tool calling. - What about reasoning / chain-of-thought modes?
- Not yet a column. Reasoning capability is currently a model-level distinction (o1, o3, Claude's extended thinking) rather than a flag. We'll add a "reasoning" column when the row count of reasoning-equipped models warrants it.
- Why does Claude show audio = false but it can transcribe audio in some clients?
- The matrix tracks API-native capability, not what client wrappers (e.g., Claude Desktop with audio attachments) layer on top. Claude's API doesn't take raw audio yet.
Related tools
- LLM Pricing Comparison
List-price reference for chat-model APIs across major providers, sortable and reviewed monthly.
- Per-call LLM Cost Calculator
Pick a model, paste your prompt, set expected output length, see the per-call cost.
- Monthly LLM Cost Estimator
Plug in daily request volume + average token sizes, see monthly spend per model side-by-side.
- Rate Limit / TPM Calculator
Sanity-check whether your traffic stays within RPM and TPM ceilings for a given model and tier.