explainerMay 16, 2026·4 min read

What is an AI token, really?

Tokens aren't words and aren't characters. They're the chunks an AI model actually sees — and understanding the difference saves money and frustration.

by PromptCount Team

If you've used the OpenAI API or worked with prompt length limits, you've seen the word token. It's everywhere — pricing pages, model docs, error messages. Most explanations are vague: "tokens are roughly four characters" or "about ¾ of a word." Those rules of thumb are okay for back-of-envelope math, but they hide what's actually going on.

Here's a clearer picture.

What a token actually is

Language models don't read characters. They don't read words either. They read tokens — chunks of text that the model's tokenizer has decided are useful primitives.

The tokenizer is a piece of preprocessing that runs before the model ever sees your text. It takes your prompt and splits it into pieces, then turns each piece into a number (a token ID). The model only ever works with the numbers.

A typical tokenizer like cl100k_base (used by GPT-4 class models) has a vocabulary of about 100,000 tokens. The training process figures out which character sequences should be merged into single tokens and which should stay split.

The practical result:

Text	Tokens
`Hello`	1 token
`Hello world`	2 tokens
`Hello, world!`	4 tokens (`Hello`, `,`, `world`, `!`)
`internationalization`	1–2 tokens (common enough to be one)
`pneumonoultramicroscopicsilicovolcanoconiosis`	7+ tokens (rare, split into pieces)
`你好世界` (Hello world in Chinese)	~5–6 tokens

Common words get their own tokens. Rare words get split. Punctuation usually gets its own token. Leading spaces are attached to the following word.

Why this matters

Three concrete reasons.

1. Billing

Every commercial AI API charges by tokens — both input and output. When you send a 1,000-character prompt to GPT-4 and get a 2,000-character reply, you're not paying for 3,000 characters. You're paying for roughly 250 input tokens and 500 output tokens. Pricing pages list cents per million tokens, but they assume you know how to count yours.

If you're running a tool at scale, even a 20% miscount of tokens against budget compounds fast. Better-than-rough estimates matter.

2. Context limits

Every model has a maximum number of tokens it can process in one go. ChatGPT-class models have offered windows from 4K to 256K tokens. Claude has gone past 200K. Gemini 2.5 Pro is over a million.

But a context window isn't just for your prompt — it includes the system instructions, the conversation history, retrieved documents, and the model's reply. Filling 90% of the window with prompt leaves no room for output.

A practical rule: keep prompts under 70% of the window, and budget the rest for output and any system overhead.

3. Quality

Long prompts often produce worse output. Repetition, drift, and lost focus all increase as a prompt grows. Counting tokens forces you to ask "what's actually essential here?"

How tokenization varies across models

Model	Tokenizer	Notes
GPT-4 / GPT-4o	cl100k_base / o200k_base	Optimized for English + code
Claude Opus / Sonnet	Anthropic's own	Slightly different splits, similar density
Gemini	SentencePiece variant	Different on multilingual text
Llama	SentencePiece	Open weights, different vocabulary
CLIP (Stable Diffusion)	Word-piece, 77-token cap	Very different! See below

For most Western text, the differences are small — within 10–20%. For Chinese, Japanese, code, and emoji, differences can be much larger.

The Midjourney / Stable Diffusion case

Image generators like Stable Diffusion don't use the same tokenizer as ChatGPT. They use CLIP, which has a much smaller vocabulary and a hard limit of 77 tokens per prompt. Anything past 77 tokens is silently cut off.

This is why a long, detailed Stable Diffusion prompt that "looks fine" can produce results that ignore everything in the second half. The model never saw the second half.

Midjourney has its own tokenizer but follows similar size-sensitivity patterns. Long prompts dilute strong signals.

If you're an image-prompt writer, token count is more constrained, not less. Two short prompts often beat one long one.

How we estimate tokens

The AI Prompt Counter on this site uses a calibrated heuristic:

~4 characters per token for Latin text
~1.5 characters per token for CJK text
~3 characters per token for digits
~6 characters per token for whitespace/punctuation

This is not a real tokenizer. It's a fast estimate calibrated against cl100k_base and o200k_base outputs on mixed inputs. The numbers are typically within 10% of the official count — close enough to budget against, not close enough to bill against.

If you need exact counts, every major provider publishes their tokenizer or an API endpoint:

OpenAI: tiktoken library
Anthropic: Claude tokenizer endpoint
Gemini: count_tokens in the SDK

Use the estimate to size and shape your prompt. Use the official tokenizer to confirm before a high-stakes run.

The takeaway

Tokens are the units models actually understand. Words and characters are convenient for humans but invisible to the model. Once you have a feel for token density — and how it shifts by language, by punctuation, by code — most of the awkward edges of working with LLMs get easier to reason about.

Try counting your next ten prompts. The intuition shifts fast.

Try the AI Prompt Counter →

Continue reading

guide

How to write better AI prompts

4 min read

image

Midjourney prompt length — the 77-token rule and what it really means

4 min read

comparison

ChatGPT vs Claude prompts — what actually differs

4 min read