Why counts differ across models
Each model family is trained with its own BPE tokenizer. OpenAI's o200k_base (GPT-4o family) has ~200,000 sub-word units; cl100k_base (GPT-4, GPT-3.5) has ~100,000. Larger vocabularies map more characters to a single token, which means lower token counts for the same input — especially for non-English text where common multi-character sequences get their own merged token.
For pure English, the gap is usually 5–15%. For CJK or code, the o200k tokenizer can be 30–50% denser. That's why the same RAG passage might cost noticeably different amounts on GPT-4o vs GPT-3.5.
FAQ
- Why does GPT-4o usually count fewer tokens than GPT-3.5?
- GPT-4o uses
o200k_base, which has roughly twice the vocabulary ofcl100k_base(used by GPT-4 / GPT-3.5). Larger vocab packs more text per token, especially for multilingual content. - How accurate are the Claude / Gemini rows?
- They're approximations using OpenAI's tokenizer as a proxy. In practice it's within ~10% for most English content; CJK can drift further. For accurate sizing, call Anthropic's
count_tokensor Gemini's tokenization endpoint server-side. - Why care about the difference?
- Two reasons: cost (you pay per input + output token) and context fit (different models have different windows). A prompt that fits comfortably in GPT-4o (128k) can still cost less to run on GPT-4o-mini, but might balloon at GPT-3.5 if it has heavy CJK content.
- Why doesn't the page show DeepSeek / Mistral / Cohere?
- Their tokenizers aren't available as a JS package today. We could approximate but the approximation drifts more than for the OpenAI-aligned families. We'd rather omit than mislead.
Common pitfalls
- Estimating token cost on one model and shipping with another, then being surprised by the bill.
- Trusting a "Claude tokenizer" npm package that's actually just
tiktokenrenamed. - Forgetting that tool / function definitions count as input tokens too.
- Comparing chars, not tokens. Char-based estimates miss the real cost driver by a wide margin.
In your code
npm i js-tiktoken import { encodingForModel } from 'js-tiktoken';
function compareModels(text) {
return ['gpt-4o', 'gpt-4-turbo', 'gpt-3.5-turbo'].map(m => ({
model: m,
tokens: encodingForModel(m).encode(text).length,
}));
} Related tools
- Token Counter
Count tokens for GPT-4o, GPT-4, GPT-3.5 and more — tokenizer-exact, runs in your browser.
- Context Window Calculator
Pick a model, paste your prompt, see how much context you have left after reserving output tokens.
- RAG Chunk Estimator
Estimate chunk count and embedding token spend from chunk size + overlap + corpus size.
- Embedding Dimension Reference
Reference table of embedding model output dimensions, max input tokens, and pricing.