Video AI

Wan 2.2: Text-to-Video & Image-to-Video

Everything about the Wan 2.2 video AI models – from setup to optimal video generation.

10 min readUpdated: February 4, 2026
Wan 2.2VideoText-to-VideoImage-to-Video

Table of Contents

01What Is Wan 2.2?

Wan 2.2 is a family of open-source video AI models that can be run locally. They enable Text-to-Video (T2V) and Image-to-Video (I2V) generation directly on your PC. Wan 2.2 has established itself as one of the best open-source video generators and is particularly well integrated into ComfyUI.

02Model Variants

Wan 2.2 comes in various sizes:

  • Wan 2.2 1.3B T2V: Smaller variant for text-to-video. 8 GB VRAM minimum. Fast generation, moderate quality. Ideal for experimenting.
  • Wan 2.2 14B T2V: Large variant for text-to-video. 24+ GB VRAM recommended. Significantly better quality and coherence. Recommended for final results.
  • Wan 2.2 1.3B I2V: Smaller variant for image-to-video. Takes an image as input and animates it.
  • Wan 2.2 14B I2V: Large variant for image-to-video. Best quality for image animation.

03Setup in ComfyUI

For Wan 2.2 in ComfyUI you need: the Wan 2.2 model (download from Hugging Face), the CLIP encoder, and optionally the VAE. Place the files in the corresponding ComfyUI folders. Use our pre-configured Wan 2.2 workflows from the ComfyVault Gallery for the fastest start.

04Optimal Settings

Tips for the best video quality:

  • Start with lower resolution (480p) and then upscale – saves a lot of time when experimenting
  • Use 30–50 sampling steps for a good quality-speed ratio
  • CFG Scale: 6–8 for natural movements, higher for stronger prompt fidelity
  • Use short, concise prompts – video models prefer clarity over detail
  • For I2V: Use high-quality input images – input quality determines output quality

05LightX2V Acceleration

Tip

LightX2V is an optimization technique that significantly speeds up Wan 2.2 generation. Through intelligent caching and optimized calculations, generation time can be reduced by up to 50% – with minimal quality loss. ComfyVault offers special workflows with LightX2V integration.

06Video Post-Processing

Generated videos often benefit from post-processing: Upscaling with RIFE or Real-ESRGAN for higher resolution, frame interpolation for smoother motion, and color correction for more consistent colors. These steps can be integrated directly into ComfyUI as part of the workflow.

Recommended Hardware

Hardware Recommendations

The best hardware for local AI generation. Our recommendations based on price-performance and compatibility.

* Affiliate links: If you purchase through these links, we receive a small commission at no additional cost to you. This helps us keep ComfyVault free.

Discover More

Explore more articles in our Knowledge Base and become an expert in local AI.