Video AI

Wan 2.2: Text-to-Video & Image-to-Video

Everything about the Wan 2.2 video AI models – from setup to optimal video generation.

10 min readUpdated: February 4, 2026

Wan 2.2VideoText-to-VideoImage-to-Video

01What Is Wan 2.2?

Wan 2.2 is a family of open-source video AI models that can be run locally. They enable Text-to-Video (T2V) and Image-to-Video (I2V) generation directly on your PC. Wan 2.2 has established itself as one of the best open-source video generators and is particularly well integrated into ComfyUI.

02Model Variants

Wan 2.2 comes in various sizes:

Wan 2.2 1.3B T2V: Smaller variant for text-to-video. 8 GB VRAM minimum. Fast generation, moderate quality. Ideal for experimenting.
Wan 2.2 14B T2V: Large variant for text-to-video. 24+ GB VRAM recommended. Significantly better quality and coherence. Recommended for final results.
Wan 2.2 1.3B I2V: Smaller variant for image-to-video. Takes an image as input and animates it.
Wan 2.2 14B I2V: Large variant for image-to-video. Best quality for image animation.

03Setup in ComfyUI

For Wan 2.2 in ComfyUI you need: the Wan 2.2 model (download from Hugging Face), the CLIP encoder, and optionally the VAE. Place the files in the corresponding ComfyUI folders. Use our pre-configured Wan 2.2 workflows from the ComfyVault Gallery for the fastest start.

04Optimal Settings

Tips for the best video quality:

Start with lower resolution (480p) and then upscale – saves a lot of time when experimenting
Use 30–50 sampling steps for a good quality-speed ratio
CFG Scale: 6–8 for natural movements, higher for stronger prompt fidelity
Use short, concise prompts – video models prefer clarity over detail
For I2V: Use high-quality input images – input quality determines output quality

05LightX2V Acceleration

Tip

LightX2V is an optimization technique that significantly speeds up Wan 2.2 generation. Through intelligent caching and optimized calculations, generation time can be reduced by up to 50% – with minimal quality loss. ComfyVault offers special workflows with LightX2V integration.

06Video Post-Processing

Generated videos often benefit from post-processing: Upscaling with RIFE or Real-ESRGAN for higher resolution, frame interpolation for smoother motion, and color correction for more consistent colors. These steps can be integrated directly into ComfyUI as part of the workflow.

Recommended Hardware

Hardware Recommendations

The best hardware for local AI generation. Our recommendations based on price-performance and compatibility.

Graphics Cards (GPU)

NVIDIA RTX 3060 12GB

Entry

Best entry-level model for local AI. 12 GB VRAM is sufficient for SDXL and small LLMs.

from ~$300

NVIDIA RTX 4070 Ti Super 16GB

Recommended

Ideal mid-range GPU. 16 GB VRAM for Flux, SDXL, and medium-sized LLMs.

from ~$800

NVIDIA RTX 4090 24GB

High-End

High-end GPU for demanding models. 24 GB VRAM for Wan 2.2 14B and large LLMs.

from ~$1,800

NVIDIA RTX 5090 32GB

Enthusiast

Maximum performance and VRAM. 32 GB for all current and future AI models.

from ~$2,200

* Affiliate links: If you purchase through these links, we receive a small commission at no additional cost to you. This helps us keep ComfyVault free.