Wan 2.2: Text-to-Video & Image-to-Video
Everything about the Wan 2.2 video AI models – from setup to optimal video generation.
Table of Contents
01What Is Wan 2.2?
Wan 2.2 is a family of open-source video AI models that can be run locally. They enable Text-to-Video (T2V) and Image-to-Video (I2V) generation directly on your PC. Wan 2.2 has established itself as one of the best open-source video generators and is particularly well integrated into ComfyUI.
02Model Variants
Wan 2.2 comes in various sizes:
- Wan 2.2 1.3B T2V: Smaller variant for text-to-video. 8 GB VRAM minimum. Fast generation, moderate quality. Ideal for experimenting.
- Wan 2.2 14B T2V: Large variant for text-to-video. 24+ GB VRAM recommended. Significantly better quality and coherence. Recommended for final results.
- Wan 2.2 1.3B I2V: Smaller variant for image-to-video. Takes an image as input and animates it.
- Wan 2.2 14B I2V: Large variant for image-to-video. Best quality for image animation.
03Setup in ComfyUI
For Wan 2.2 in ComfyUI you need: the Wan 2.2 model (download from Hugging Face), the CLIP encoder, and optionally the VAE. Place the files in the corresponding ComfyUI folders. Use our pre-configured Wan 2.2 workflows from the ComfyVault Gallery for the fastest start.
04Optimal Settings
Tips for the best video quality:
- Start with lower resolution (480p) and then upscale – saves a lot of time when experimenting
- Use 30–50 sampling steps for a good quality-speed ratio
- CFG Scale: 6–8 for natural movements, higher for stronger prompt fidelity
- Use short, concise prompts – video models prefer clarity over detail
- For I2V: Use high-quality input images – input quality determines output quality
05LightX2V Acceleration
LightX2V is an optimization technique that significantly speeds up Wan 2.2 generation. Through intelligent caching and optimized calculations, generation time can be reduced by up to 50% – with minimal quality loss. ComfyVault offers special workflows with LightX2V integration.
06Video Post-Processing
Generated videos often benefit from post-processing: Upscaling with RIFE or Real-ESRGAN for higher resolution, frame interpolation for smoother motion, and color correction for more consistent colors. These steps can be integrated directly into ComfyUI as part of the workflow.
Hardware Recommendations
The best hardware for local AI generation. Our recommendations based on price-performance and compatibility.
Graphics Cards (GPU)
NVIDIA RTX 3060 12GB
EntryBest entry-level model for local AI. 12 GB VRAM is sufficient for SDXL and small LLMs.
from ~$300NVIDIA RTX 4070 Ti Super 16GB
RecommendedIdeal mid-range GPU. 16 GB VRAM for Flux, SDXL, and medium-sized LLMs.
from ~$800NVIDIA RTX 4090 24GB
High-EndHigh-end GPU for demanding models. 24 GB VRAM for Wan 2.2 14B and large LLMs.
from ~$1,800NVIDIA RTX 5090 32GB
EnthusiastMaximum performance and VRAM. 32 GB for all current and future AI models.
from ~$2,200* Affiliate links: If you purchase through these links, we receive a small commission at no additional cost to you. This helps us keep ComfyVault free.
No GPU? Rent Cloud GPUs
You don't need to buy an expensive GPU. Cloud GPU providers allow you to run AI models on powerful hardware by the hour.
RunPod
PopularCloud GPUs from $0.20/hr. Ideal for testing large models without expensive hardware. Easy ComfyUI templates available.
from $0.20/hrVast.ai
BudgetCheapest cloud GPUs on the market. Marketplace model with GPUs from $0.10/hr. Perfect for longer training sessions.
from $0.10/hrLambda Cloud
PremiumPremium cloud GPUs with A100/H100. For professional users who need maximum performance.
from $1.10/hr* Affiliate links: If you sign up through these links, we receive a small commission. There are no additional costs for you.