Promptyard
Tokens

Multi-model Token Compare

Side-by-side token counts for the same input across GPT-4o, GPT-4 Turbo, GPT-3.5, Claude, Gemini.

115 chars
Token count per modelLowest: 40
GPT-4oOpenAIcheapest
40tokens
2.88 chars/token · tokenizer o200k_base
GPT-4 TurboOpenAI
46tokens
2.50 chars/token · tokenizer cl100k_base
GPT-3.5 TurboOpenAI
46tokens
2.50 chars/token · tokenizer cl100k_base
Claude 3.7 SonnetAnthropicapprox
40tokens
2.88 chars/token · tokenizer approx via o200k
Gemini 2.0 ProGoogleapprox
40tokens
2.88 chars/token · tokenizer approx via o200k

Lower count = denser tokenization = cheaper per call. Anthropic + Google rows are approximate (~10% off).

Why counts differ across models

Each model family is trained with its own BPE tokenizer. OpenAI's o200k_base (GPT-4o family) has ~200,000 sub-word units; cl100k_base (GPT-4, GPT-3.5) has ~100,000. Larger vocabularies map more characters to a single token, which means lower token counts for the same input — especially for non-English text where common multi-character sequences get their own merged token.

For pure English, the gap is usually 5–15%. For CJK or code, the o200k tokenizer can be 30–50% denser. That's why the same RAG passage might cost noticeably different amounts on GPT-4o vs GPT-3.5.

FAQ

Why does GPT-4o usually count fewer tokens than GPT-3.5?
GPT-4o uses o200k_base, which has roughly twice the vocabulary of cl100k_base (used by GPT-4 / GPT-3.5). Larger vocab packs more text per token, especially for multilingual content.
How accurate are the Claude / Gemini rows?
They're approximations using OpenAI's tokenizer as a proxy. In practice it's within ~10% for most English content; CJK can drift further. For accurate sizing, call Anthropic's count_tokens or Gemini's tokenization endpoint server-side.
Why care about the difference?
Two reasons: cost (you pay per input + output token) and context fit (different models have different windows). A prompt that fits comfortably in GPT-4o (128k) can still cost less to run on GPT-4o-mini, but might balloon at GPT-3.5 if it has heavy CJK content.
Why doesn't the page show DeepSeek / Mistral / Cohere?
Their tokenizers aren't available as a JS package today. We could approximate but the approximation drifts more than for the OpenAI-aligned families. We'd rather omit than mislead.

Common pitfalls

  • Estimating token cost on one model and shipping with another, then being surprised by the bill.
  • Trusting a "Claude tokenizer" npm package that's actually just tiktoken renamed.
  • Forgetting that tool / function definitions count as input tokens too.
  • Comparing chars, not tokens. Char-based estimates miss the real cost driver by a wide margin.

In your code

js-tiktoken js npm →
npm i js-tiktoken
import { encodingForModel } from 'js-tiktoken';

function compareModels(text) {
  return ['gpt-4o', 'gpt-4-turbo', 'gpt-3.5-turbo'].map(m => ({
    model: m,
    tokens: encodingForModel(m).encode(text).length,
  }));
}

Related tools