Stable Diffusion: SDXL, SD 1.5 & SD 3.x
A comprehensive overview of the Stable Diffusion model family – from SD 1.5 to the latest generation.
Table of Contents
01The Stable Diffusion Family
Stable Diffusion is a family of open-source text-to-image models developed by Stability AI in collaboration with the research community. These models have revolutionized local AI image generation and are the foundation for thousands of community models, LoRAs, and workflows.
02SD 1.5 – The Classic
Stable Diffusion 1.5 is the most widely used model and forms the basis for countless fine-tunes and LoRAs. It natively generates images at 512x512 pixels and runs on older GPUs with just 4 GB VRAM. Although technically outdated, SD 1.5 remains relevant thanks to the massive ecosystem of community models, LoRAs, and embeddings. Many specialized models (Realistic Vision, DreamShaper, etc.) are based on SD 1.5.
03SDXL 1.0 – The Standard
SDXL (Stable Diffusion XL) is the current standard for high-quality image generation. Native resolution: 1024x1024 pixels. SDXL offers significantly better image quality, text understanding, and coherence compared to SD 1.5. It consists of a base model and an optional refiner. VRAM requirement: 8–12 GB. The SDXL ecosystem is steadily growing with dedicated LoRAs, ControlNets, and community models.
04SD 3.x – The New Generation
Stable Diffusion 3 and its variants (3.5 Medium, 3.5 Large) use a new architecture with a 'Multimodal Diffusion Transformer' (MMDiT). The advantages include significantly better text understanding, more accurate prompt following, and improved anatomy. SD 3.5 Large requires at least 12 GB VRAM and offers the best quality in the SD family.
05Which Model for Whom?
Recommendations based on your situation:
- Beginners with older GPU (4–6 GB VRAM): SD 1.5 with community models
- Standard users (8–12 GB VRAM): SDXL for the best balance of quality and speed
- Quality-conscious users (12+ GB VRAM): SD 3.5 or Flux for maximum image quality
- Specific styles desired: SD 1.5 has the largest selection of specialized fine-tunes
- Text in images needed: SD 3.x has the best text rendering capability
06Model Formats
Stable Diffusion models come in various formats: Safetensors (recommended, safe), CKPT (older format, potentially unsafe), and diffusers (Hugging Face format). Safetensors files are recommended for ComfyUI. They are safer than CKPT files and offer the same quality.