What is an AI token, really?
Tokens aren't words and aren't characters. They're the chunks an AI model actually sees — and understanding the difference saves money and frustration.
by PromptCount Team
If you've used the OpenAI API or worked with prompt length limits, you've seen the word token. It's everywhere — pricing pages, model docs, error messages. Most explanations are vague: "tokens are roughly four characters" or "about ¾ of a word." Those rules of thumb are okay for back-of-envelope math, but they hide what's actually going on.
Here's a clearer picture.
What a token actually is
Language models don't read characters. They don't read words either. They read tokens — chunks of text that the model's tokenizer has decided are useful primitives.
The tokenizer is a piece of preprocessing that runs before the model ever sees your text. It takes your prompt and splits it into pieces, then turns each piece into a number (a token ID). The model only ever works with the numbers.
A typical tokenizer like cl100k_base (used by GPT-4 class models) has a vocabulary of about 100,000 tokens. The training process figures out which character sequences should be merged into single tokens and which should stay split.
The practical result:
| Text | Tokens |
|---|---|
Hello |
1 token |
Hello world |
2 tokens |
Hello, world! |
4 tokens (Hello, ,, world, !) |
internationalization |
1–2 tokens (common enough to be one) |
pneumonoultramicroscopicsilicovolcanoconiosis |
7+ tokens (rare, split into pieces) |
你好世界 (Hello world in Chinese) |
~5–6 tokens |
Common words get their own tokens. Rare words get split. Punctuation usually gets its own token. Leading spaces are attached to the following word.
Why this matters
Three concrete reasons.
1. Billing
Every commercial AI API charges by tokens — both input and output. When you send a 1,000-character prompt to GPT-4 and get a 2,000-character reply, you're not paying for 3,000 characters. You're paying for roughly 250 input tokens and 500 output tokens. Pricing pages list cents per million tokens, but they assume you know how to count yours.
If you're running a tool at scale, even a 20% miscount of tokens against budget compounds fast. Better-than-rough estimates matter.
2. Context limits
Every model has a maximum number of tokens it can process in one go. ChatGPT-class models have offered windows from 4K to 256K tokens. Claude has gone past 200K. Gemini 2.5 Pro is over a million.
But a context window isn't just for your prompt — it includes the system instructions, the conversation history, retrieved documents, and the model's reply. Filling 90% of the window with prompt leaves no room for output.
A practical rule: keep prompts under 70% of the window, and budget the rest for output and any system overhead.
3. Quality
Long prompts often produce worse output. Repetition, drift, and lost focus all increase as a prompt grows. Counting tokens forces you to ask "what's actually essential here?"
How tokenization varies across models
| Model | Tokenizer | Notes |
|---|---|---|
| GPT-4 / GPT-4o | cl100k_base / o200k_base | Optimized for English + code |
| Claude Opus / Sonnet | Anthropic's own | Slightly different splits, similar density |
| Gemini | SentencePiece variant | Different on multilingual text |
| Llama | SentencePiece | Open weights, different vocabulary |
| CLIP (Stable Diffusion) | Word-piece, 77-token cap | Very different! See below |
For most Western text, the differences are small — within 10–20%. For Chinese, Japanese, code, and emoji, differences can be much larger.
The Midjourney / Stable Diffusion case
Image generators like Stable Diffusion don't use the same tokenizer as ChatGPT. They use CLIP, which has a much smaller vocabulary and a hard limit of 77 tokens per prompt. Anything past 77 tokens is silently cut off.
This is why a long, detailed Stable Diffusion prompt that "looks fine" can produce results that ignore everything in the second half. The model never saw the second half.
Midjourney has its own tokenizer but follows similar size-sensitivity patterns. Long prompts dilute strong signals.
If you're an image-prompt writer, token count is more constrained, not less. Two short prompts often beat one long one.
How we estimate tokens
The AI Prompt Counter on this site uses a calibrated heuristic:
- ~4 characters per token for Latin text
- ~1.5 characters per token for CJK text
- ~3 characters per token for digits
- ~6 characters per token for whitespace/punctuation
This is not a real tokenizer. It's a fast estimate calibrated against cl100k_base and o200k_base outputs on mixed inputs. The numbers are typically within 10% of the official count — close enough to budget against, not close enough to bill against.
If you need exact counts, every major provider publishes their tokenizer or an API endpoint:
- OpenAI:
tiktokenlibrary - Anthropic: Claude tokenizer endpoint
- Gemini:
count_tokensin the SDK
Use the estimate to size and shape your prompt. Use the official tokenizer to confirm before a high-stakes run.
The takeaway
Tokens are the units models actually understand. Words and characters are convenient for humans but invisible to the model. Once you have a feel for token density — and how it shifts by language, by punctuation, by code — most of the awkward edges of working with LLMs get easier to reason about.
Try counting your next ten prompts. The intuition shifts fast.