Skip to content

vipshop/cache-dit

Repository files navigation

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization and CPU Offload for DiTs
Featured|HelloGitHub

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the 🤗Diffusers library and now supports nearly ALL DiTs from Diffusers. It provides hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations, including Context Parallelism, Tensor Parallelism, hybrid 2D or 3D parallelism, and dedicated extra parallelism support for Text Encoder, VAE, and ControlNet.

Cache-DiT is compatible with compilation, CPU Offloading, and quantization, fully integrates with SGLang Diffusion, vLLM-Omni, TensorRT-LLM, ComfyUI, and runs natively on NVIDIA GPUs, Ascend NPUs and AMD GPUs. Cache-DiT is fast, easy to use, and flexible for various DiTs (online docs at 📘cache-dit.io). Cache-DiT's technical report is available at 🎉Cache-DiT: A Unified PyTorch-Native Inference Engine for Diffusion Transformers.

⚡️9x speedup by Cache-DiT with Cache, Context Parallelism and Compilation

📋Supported DiT Models

Cache-DiT supports 40+ DiT pipeline families (120+ Variants) from 🤗Diffusers, covering the vast majority of DiT-based pipelines. For full support matrix and detailed usage, please refer to our documentation at 📘cache-dit.io.

Modality Pipeline Series Transformer Variants C/P/Q/OF
Image FLUX FluxTransformer2DModel 10+ ✔️/✔️/✔️/✔️
Image FLUX.2 Flux2Transformer2DModel 1 ✔️/✔️/✔️/✔️
Image FLUX.2 Klein Flux2Transformer2DModel 3 ✔️/✔️/✔️/✔️
Image ERNIE-Image ErnieImageTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image Qwen-Image QwenImageTransformer2DModel 9+ ✔️/✔️/✔️/✔️
Image Z-Image ZImageTransformer2DModel 6 ✔️/✔️/✔️/✔️
Image Joy-Image JoyImageEditTransformer3DModel 1 ✔️/✔️/✔️/✔️
Image Boogu-Image BooguImageTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image Krea-2 Krea2Transformer2DModel 1 ✔️/✔️/✔️/✔️
Image Bria-Fibo BriaFiboTransformer2DModel 2 ✔️/✔️/✔️/✔️
Image GLM-Image GlmImageTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image LongCat-Image LongCatImageTransformer2DModel 2 ✔️/✔️/✔️/✔️
Image Ovis-Image OvisImageTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image SD3 SD3Transformer2DModel 3 ✔️/✖️/✔️/✔️
Image PixArt-Alpha PixArtTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image PixArt-Sigma PixArtTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image Sana (image) SanaTransformer2DModel 4 ✔️/✖️/✔️/✔️
Image DiT DiTTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image HunyuanDiT HunyuanDiT2DModel 1 ✔️/✔️/✔️/✔️
Image HunyuanDiT-PAG HunyuanDiT2DModel 1 ✔️/✔️/✔️/✔️
Image AuraFlow AuraFlowTransformer2DModel 1 ✔️/✖️/✔️/✔️
Image CogView4 CogView4Transformer2DModel 2 ✔️/✔️/✔️/✔️
Image CogView3Plus CogView3PlusTransformer2DModel 1 ✔️/✔️/✔️/✔️
Image Hunyuan-Image HunyuanImageTransformer2DModel 2 ✔️/✔️/✔️/✔️
Image HiDream HiDreamImageTransformer2DModel 1 ✔️/✖️/✔️/✔️
Image Bria BriaTransformer2DModel 1 ✔️/✖️/✔️/✔️
Image Chroma ChromaTransformer2DModel 3 ✔️/✔️/✔️/✔️
Image PRX PRXTransformer2DModel 2 ✔️/✖️/✔️/✔️
Image OmniGen OmniGenTransformer2DModel 1 ✔️/✖️/✔️/✔️
Image Lumina LuminaNextDiT2DModel 2 ✔️/✖️/✔️/✔️
Image Lumina 2 Lumina2Transformer2DModel 2 ✔️/✖️/✔️/✔️
Image VisualCloze FluxTransformer2DModel 2 ✔️/✔️/✔️/✔️
Image Amused UVit2DModel 1 ✔️/✖️/✔️/✔️
Video CogVideoX CogVideoXTransformer3DModel 4 ✔️/✔️/✔️/✔️
Video Wan (T2V/I2V) WanTransformer3DModel 3 ✔️/✔️/✔️/✔️
Video Wan-VACE WanVACETransformer3DModel 1 ✔️/✔️/✔️/✔️
Video HunyuanVideo HunyuanVideoTransformer3DModel 4 ✔️/✔️/✔️/✔️
Video HunyuanVideo 1.5 HunyuanVideo15Transformer3DModel 2 ✔️/✖️/✔️/✔️
Video Mochi MochiTransformer3DModel 1 ✔️/✔️/✔️/✔️
Video Allegro AllegroTransformer3DModel 1 ✔️/✖️/✔️/✔️
Video EasyAnimate EasyAnimateTransformer3DModel 3 ✔️/✖️/✔️/✔️
Video ConsisID ConsisIDTransformer3DModel 1 ✔️/✔️/✔️/✔️
Video Cosmos CosmosTransformer3DModel 7+ ✔️/✖️/✔️/✔️
Video LTX LTXVideoTransformer3DModel 5 ✔️/✔️/✔️/✔️
Video LTX2 LTX2VideoTransformer3DModel 6 ✔️/✔️/✔️/✔️
Video Helios HeliosTransformer3DModel 2 ✔️/✔️/✔️/✔️
Video ChronoEdit ChronoEditTransformer3DModel 1 ✔️/✔️/✔️/✔️
Video SkyReelsV2 SkyReelsV2Transformer3DModel 5 ✔️/✔️/✔️/✔️
Video Kandinsky5 (T2V/I2V) Kandinsky5Transformer3DModel 4 ✔️/✔️/✔️/✔️
Audio StableAudio StableAudioDiTModel 1 ✔️/✖️/✔️/✔️
3D/Other Shap-E PriorTransformer 1 ✔️/✖️/✔️/✔️
Other LucyEdit WanTransformer3DModel 1 ✔️/✔️/✔️/✔️

C: Hybrid Cache (DBCache + Calibrator: TaylorSeer/DMD/FoCa/SCM); P: Parallelism (Ulysses/Ring/USP/TP
TE-P/VAE-P/2D-P/3D-P); Q: Quantization (W8A8, W4A4); OF: Bucket-style Layerwise Offload (~0 overhead)

🤖Agentic Workflows

Cache-DiT provides a model-integration SKILL to help users integrate new DiT pipelines into Cache-DiT, including Cache, CP, TP, TE-P, VAE-P and carefully designed test cases. Users can use it with Coding Agents, e.g, GitHub Copilot, Claude Code, Open Code.

Note

Please note that quantization and layerwise offload in Cache-DiT are generally supported for nn.Module, thus no extra integration is needed for new DiT pipelines or transformers.

🌐Community Integration

©️Acknowledgements

Special thanks to vipshop's Computer Vision AI Team for supporting testing and deployment of this project. We learned and reused codes from: Diffusers, SGLang, vLLM-Omni, Nunchaku, xDiT and TaylorSeer.

©️Citations

@misc{cache-dit@2025,
  title={Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism, Quantization and CPU Offload for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com, etc.},
  year={2025}
}

About

A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors