HunyuanVideo-1.5: a lightweight 8.3B parameter video generation model for high-quality synthesis on consumer GPUs

HunyuanVideo-1.5: a lightweight 8.3B parameter video generation model for high-quality synthesis on consumer GPUs

What it solves

HunyuanVideo-1.5 is a lightweight video generation model designed to provide high-quality video synthesis while reducing the computational barriers for developers and creators. It enables the generation of professional-grade videos on consumer-grade GPUs, addressing the need for efficient, high-resolution video creation without requiring massive industrial hardware.

How it works

The project utilizes an 8.3 billion parameter Diffusion Transformer (DiT) combined with a 3D causal VAE for efficient spatial and temporal compression. It employs a Selective and Sliding Tile Attention (SSTA) mechanism to prune redundant data and accelerate inference. To further enhance quality, it includes a video super-resolution (VSR) network that upscales outputs to 1080p. The model supports both text-to-video (T2V) and image-to-video (I2V) generation and can be further optimized using step-distillation for faster generation speeds.

Who it’s for

It is intended for developers, AI researchers, and digital creators who want to generate high-quality videos using accessible hardware (minimum 14GB GPU memory) and those looking to integrate video generation into their own applications via tools like ComfyUI or Diffusers.

Highlights

  • Consumer-Grade Accessibility: Runs on NVIDIA GPUs with as little as 14GB of VRAM using model offloading.
  • High-Performance Architecture: Uses SSTA to achieve significant speedups in 720p video synthesis.
  • Flexible Generation: Supports both Text-to-Video and Image-to-Video workflows across various resolutions.
  • Advanced Optimizations: Includes support for FP8 GEMM, cache inference (DeepCache, TeaCache, TaylorCache), and step-distilled models for rapid generation.
  • Super-Resolution: Integrated few-step network to upscale videos to 1080p for improved sharpness and texture.

Sources