Image prompts · CLIP-77 aware

Midjourney Prompt Counter

See how many tokens your image prompt actually uses. Most image generators (Midjourney, Stable Diffusion, Flux) use a CLIP text encoder with a hard 77-token cap. Anything past that gets dropped or diluted.

Live · runs in your browser

Case

Keyword density

Top 8

Add a prompt to see your top keywords.

The 77-token rule

CLIP, the text encoder used by Stable Diffusion and (with variations) by Midjourney and Flux, has a fixed input size of 77 tokens. Your prompt enters CLIP, which produces a 77-token-long embedding. Anything past 77 tokens is either truncated (in older pipelines) or split into multiple chunks and averaged (in newer ones). Either way, the second half of a long prompt rarely lands the way you intended.

In practice, 77 tokens is about 55–60 English words — fewer if your prompt is heavy with adjectives or hyphenated terms.

The tool above shows your token budget against the 77-token cap when you select the Stable Diffusion model preset. Green under 50%, amber under 85%, red after.

A reusable structure for image prompts

Five slots that fit comfortably under 77 tokens:

Subject — 5–10 tokens (“a futuristic fashion model”)
Setting — 5–10 tokens (“in a neon-lit Tokyo street”)
Lighting / camera — 10–15 tokens (“85mm lens, soft rim lighting, shallow depth of field”)
Style — 5–10 tokens (“editorial photography style”)
Composition / aspect — 3–6 tokens (“9:16 vertical”)

Total: ~40–60 tokens. Room to spare for one or two specifics.

What to cut when you’re over budget

Three patterns push image prompts past 77 tokens. All three are easy to trim:

Stacked style adjectives— “moody, atmospheric, dramatic, cinematic, professional, photorealistic, ultra-detailed, hyperrealistic, masterpiece...” Pick one or two that actually point at different things.
Multiple influences in long form— “in the style of Petra Collins meets Wong Kar-wai meets Ridley Scott...” Pick one. The model can only blend so many references in 77 tokens.
Generic quality words— “masterpiece, best quality, ultra-detailed, professional photography, award-winning” — these mostly burn tokens without carrying visual information on modern models.

Negative prompts count separately

Most image tools handle negative prompts (the things you don’t want) in a separate CLIP pass. So a 50-token positive + a 20-token negative fits without conflict. Use a short, aimed negative prompt to remove common failure modes rather than stuffing everything into the positive prompt.

Related tools & reading

AI Prompt Counter

General-purpose, all models.

Full Midjourney length guide

Deeper writeup with examples.