Promptyard
Tokens

Token Counter

Count tokens for GPT-4o, GPT-4, GPT-3.5 and more — tokenizer-exact, runs in your browser.

108 chars
Result
34tokens3.18 chars/token
Fills 0.0% of GPT-4o's 128,000-token context window.
Visualisation
The·quick·brown·fox·jumps·over·the·lazy·dog.·�你好世界function·hello(name:·string)·{·return·`hi,·${name}`;·}

How it works

Modern LLMs don't see characters — they see tokens, sub-word units produced by a Byte-Pair Encoding (BPE) tokenizer. Common English words are usually one token; rarer words split into multiple; non-Latin scripts often run a few characters per token.

This tool runs OpenAI's reference tokenizer in your browser via js-tiktoken, the official-API-compatible JavaScript port. The token IDs you see here match what OpenAI's tiktoken Python library produces — and what the model itself sees during inference.

The visualisation alternates background colours per token so you can spot how a tokenizer chunks text. Hover any token to see its numeric ID. · represents a space; represents a newline.

Examples

  • "Hello, world!" 4 tokens (gpt-4o)
  • "你好,世界" 4 tokens (gpt-4o) · 11 tokens (gpt-3.5) o200k_base is much denser for CJK than cl100k_base.
  • "const x = 42;" 5 tokens

FAQ

Why does the same string have different token counts across models?
Different models use different tokenizers. GPT-4o uses o200k_base (≈200k vocab); GPT-4 and GPT-3.5 use cl100k_base (≈100k vocab). Larger vocabularies tend to fit more characters per token, especially for non-English text.
Does this work for Claude or Gemini?
Not yet — those tokenizers aren't publicly distributed in the same form. Anthropic's SDK has a count_tokens API, and Gemini exposes one too. A multi-provider counter is on the roadmap; for now, this tool covers OpenAI models exactly.
Is this byte-for-byte accurate?
Yes for OpenAI. js-tiktoken is a JavaScript port of OpenAI's reference tokenizer and produces identical token IDs to the Python tiktoken library. We use it directly here.
How big is the bundle?
js-tiktoken ships the BPE rank tables for the encodings it supports. The full bundle is ~1MB gzipped. We load it on-demand only on this page (Astro island), so the rest of the site stays light.
What does "chars / token" tell me?
A rough efficiency metric. English averages ~4 chars/token. CJK text can be ~1–1.5 chars/token (denser). Code with lots of symbols can be lower. Use it to sanity-check whether content is "tokenizer-friendly".
Is anything I paste sent to a server?
No. The BPE encoding runs entirely in your browser. Your text never leaves your machine.

Common pitfalls

  • Estimating tokens as "chars / 4" — works for English, badly under-counts for CJK and over-counts for code-heavy prompts.
  • Counting input only when the model also bills output tokens at a different rate.
  • Forgetting that chat models also pay for system prompt and message-format overhead (a few tokens per message).
  • Assuming Claude / Gemini use the same tokenizer as GPT — they don't.

In your code

js-tiktoken js npm →
npm i js-tiktoken
import { encodingForModel } from 'js-tiktoken';

const enc = encodingForModel('gpt-4o');
const tokens = enc.encode('Hello, world!');
console.log(tokens.length); // 4
tiktoken (OpenAI) python pypi →
pip install tiktoken
import tiktoken

enc = tiktoken.encoding_for_model('gpt-4o')
tokens = enc.encode('Hello, world!')
print(len(tokens))  # 4

Related tools