CLIP Skip
Also known as: clip_skip
CLIP Skip controls how many of the final layers of the CLIP text encoder are skipped. Higher values produce more stylized, abstract, or "anime-like" interpretations. Lower values produce more literal, photorealistic results that closely follow the prompt.
What It Does
CLIP is the text encoder that translates your prompt into a numerical representation the image model can understand. It has multiple layers, each capturing different levels of abstraction. The early layers capture broad concepts and composition, while the later layers capture specific, literal details.
When you skip the final layers (CLIP Skip = 2 or higher), the model receives a more abstract, less literal interpretation of your prompt. This tends to produce images with a more stylized, artistic feel — which is why CLIP Skip 2 is popular in the anime and illustration community. At CLIP Skip 1 (no layers skipped), the model gets the full detailed encoding, resulting in more photorealistic and prompt-literal outputs.
Value Ranges
CLIP Skip 1 (No skip)
Full encoder output. Most literal prompt interpretation. Best for photorealism and precise prompt following.
CLIP Skip 2
Slightly abstracted. The most popular non-default value. Produces stylized results popular for anime and illustration.
CLIP Skip 3+
Highly abstracted. The model works from a very loose interpretation of the prompt. Results can be unpredictable but artistically interesting.
Visual Comparison
Image pending
clip_skip = 1
Image pending
clip_skip = 2
Image pending
clip_skip = 3
Tips
- CLIP Skip 1 is the default for most models and works best for photorealistic prompts.
- Try CLIP Skip 2 for anime, illustration, or stylized art.
- Values above 3 are rarely useful and can produce incoherent results.
- This parameter only applies to models that use CLIP as their text encoder (SDXL, SD 1.5, etc.).
cinemasetfree.com