Stable Diffusion Prompt Counter
Image generators built on Stable Diffusion (and most Flux pipelines) feed your prompt through CLIP. CLIP has a hard 77-token cap per chunk. Anything past that is truncated, averaged, or chunked — and the second half of your prompt usually loses most of its signal.
Keyword density
Top 8Add a prompt to see your top keywords.
How SD 1.5 / SDXL / Flux differ
All three use CLIP as the text encoder, but the configuration varies in ways that affect prompt budget:
| Model | Encoder(s) | Effective cap |
|---|---|---|
| SD 1.5 | Single CLIP | 77 tokens (single chunk) |
| SDXL | CLIP-L + CLIP-G | 77 tokens per encoder; same prompt sent to both |
| Flux Dev / Schnell | CLIP + T5 | CLIP capped at 77; T5 accepts up to 512 tokens |
Even on Flux, the core visual signalstill mostly goes through CLIP. T5 expands what the model can understand, but a CLIP-overflowed prompt loses subject anchoring. Treat 77 tokens as the budget that matters for clean composition, no matter which model you’re using.
Chunked / BREAK prompts
Many SD frontends (Automatic1111, ComfyUI, Forge) let you split a prompt into multiple 77-token chunks using a BREAK keyword. Each chunk is encoded independently and the embeddings are concatenated.
Chunked prompts let you fit more text, but they don’t solve the dilution problem — the second chunk competes with the first for attention. Two short well-structured chunks beat one long blob, but neither beats a single crisp 70-token prompt.
Negative prompts
Negative prompts go through CLIP separately, so they have their own 77-token budget. Keep them short and aimed — a 10–20 token negative prompt that targets specific failure modes (e.g., blurry, distorted hands, text, watermark, low resolution) is more effective than 60 tokens of generic quality words.