🤗Why Cache-DiT❓❓Cache-DiT is built on top of the 🤗Diffusers library and now supports nearly ALL DiTs from Diffusers. It provides hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations, including Context Parallelism, Tensor Parallelism, hybrid 2D or 3D parallelism, and dedicated extra parallelism support for Text Encoder, VAE, and ControlNet.
Cache-DiT is compatible with compilation, CPU Offloading, and quantization, fully integrates with SGLang Diffusion, vLLM-Omni, TensorRT-LLM, ComfyUI, and runs natively on NVIDIA GPUs, Ascend NPUs and AMD GPUs. Cache-DiT is fast, easy to use, and flexible for various DiTs (online docs at 📘cache-dit.io). Cache-DiT's technical report is available at 🎉Cache-DiT: A Unified PyTorch-Native Inference Engine for Diffusion Transformers.
Cache-DiT supports 40+ DiT pipeline families (120+ Variants) from 🤗Diffusers, covering the vast majority of DiT-based pipelines. For full support matrix and detailed usage, please refer to our documentation at 📘cache-dit.io.
| Modality | Pipeline Series | Transformer | Variants | C/P/Q/OF |
|---|---|---|---|---|
| Image | FLUX | FluxTransformer2DModel |
10+ | ✔️/✔️/✔️/✔️ |
| Image | FLUX.2 | Flux2Transformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | FLUX.2 Klein | Flux2Transformer2DModel |
3 | ✔️/✔️/✔️/✔️ |
| Image | ERNIE-Image | ErnieImageTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Qwen-Image | QwenImageTransformer2DModel |
9+ | ✔️/✔️/✔️/✔️ |
| Image | Z-Image | ZImageTransformer2DModel |
6 | ✔️/✔️/✔️/✔️ |
| Image | Joy-Image | JoyImageEditTransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Boogu-Image | BooguImageTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Krea-2 | Krea2Transformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Bria-Fibo | BriaFiboTransformer2DModel |
2 | ✔️/✔️/✔️/✔️ |
| Image | GLM-Image | GlmImageTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | LongCat-Image | LongCatImageTransformer2DModel |
2 | ✔️/✔️/✔️/✔️ |
| Image | Ovis-Image | OvisImageTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | SD3 | SD3Transformer2DModel |
3 | ✔️/✖️/✔️/✔️ |
| Image | PixArt-Alpha | PixArtTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | PixArt-Sigma | PixArtTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Sana (image) | SanaTransformer2DModel |
4 | ✔️/✖️/✔️/✔️ |
| Image | DiT | DiTTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | HunyuanDiT | HunyuanDiT2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | HunyuanDiT-PAG | HunyuanDiT2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | AuraFlow | AuraFlowTransformer2DModel |
1 | ✔️/✖️/✔️/✔️ |
| Image | CogView4 | CogView4Transformer2DModel |
2 | ✔️/✔️/✔️/✔️ |
| Image | CogView3Plus | CogView3PlusTransformer2DModel |
1 | ✔️/✔️/✔️/✔️ |
| Image | Hunyuan-Image | HunyuanImageTransformer2DModel |
2 | ✔️/✔️/✔️/✔️ |
| Image | HiDream | HiDreamImageTransformer2DModel |
1 | ✔️/✖️/✔️/✔️ |
| Image | Bria | BriaTransformer2DModel |
1 | ✔️/✖️/✔️/✔️ |
| Image | Chroma | ChromaTransformer2DModel |
3 | ✔️/✔️/✔️/✔️ |
| Image | PRX | PRXTransformer2DModel |
2 | ✔️/✖️/✔️/✔️ |
| Image | OmniGen | OmniGenTransformer2DModel |
1 | ✔️/✖️/✔️/✔️ |
| Image | Lumina | LuminaNextDiT2DModel |
2 | ✔️/✖️/✔️/✔️ |
| Image | Lumina 2 | Lumina2Transformer2DModel |
2 | ✔️/✖️/✔️/✔️ |
| Image | VisualCloze | FluxTransformer2DModel |
2 | ✔️/✔️/✔️/✔️ |
| Image | Amused | UVit2DModel |
1 | ✔️/✖️/✔️/✔️ |
| Video | CogVideoX | CogVideoXTransformer3DModel |
4 | ✔️/✔️/✔️/✔️ |
| Video | Wan (T2V/I2V) | WanTransformer3DModel |
3 | ✔️/✔️/✔️/✔️ |
| Video | Wan-VACE | WanVACETransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
| Video | HunyuanVideo | HunyuanVideoTransformer3DModel |
4 | ✔️/✔️/✔️/✔️ |
| Video | HunyuanVideo 1.5 | HunyuanVideo15Transformer3DModel |
2 | ✔️/✖️/✔️/✔️ |
| Video | Mochi | MochiTransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
| Video | Allegro | AllegroTransformer3DModel |
1 | ✔️/✖️/✔️/✔️ |
| Video | EasyAnimate | EasyAnimateTransformer3DModel |
3 | ✔️/✖️/✔️/✔️ |
| Video | ConsisID | ConsisIDTransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
| Video | Cosmos | CosmosTransformer3DModel |
7+ | ✔️/✖️/✔️/✔️ |
| Video | LTX | LTXVideoTransformer3DModel |
5 | ✔️/✔️/✔️/✔️ |
| Video | LTX2 | LTX2VideoTransformer3DModel |
6 | ✔️/✔️/✔️/✔️ |
| Video | Helios | HeliosTransformer3DModel |
2 | ✔️/✔️/✔️/✔️ |
| Video | ChronoEdit | ChronoEditTransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
| Video | SkyReelsV2 | SkyReelsV2Transformer3DModel |
5 | ✔️/✔️/✔️/✔️ |
| Video | Kandinsky5 (T2V/I2V) | Kandinsky5Transformer3DModel |
4 | ✔️/✔️/✔️/✔️ |
| Audio | StableAudio | StableAudioDiTModel |
1 | ✔️/✖️/✔️/✔️ |
| 3D/Other | Shap-E | PriorTransformer |
1 | ✔️/✖️/✔️/✔️ |
| Other | LucyEdit | WanTransformer3DModel |
1 | ✔️/✔️/✔️/✔️ |
C: Hybrid Cache (DBCache + Calibrator: TaylorSeer/DMD/FoCa/SCM); P: Parallelism (Ulysses/Ring/USP/TP
TE-P/VAE-P/2D-P/3D-P); Q: Quantization (W8A8, W4A4); OF: Bucket-style Layerwise Offload (~0 overhead)
Cache-DiT provides a model-integration SKILL to help users integrate new DiT pipelines into Cache-DiT, including Cache, CP, TP, TE-P, VAE-P and carefully designed test cases. Users can use it with Coding Agents, e.g, GitHub Copilot, Claude Code, Open Code.
Note
Please note that quantization and layerwise offload in Cache-DiT are generally supported for nn.Module, thus no extra integration is needed for new DiT pipelines or transformers.
- 🎉ComfyUI x Cache-DiT
- 🎉(Intel) llm-scaler x Cache-DiT
- 🎉Diffusers x Cache-DiT
- 🎉TensorRT-LLM x Cache-DiT
- 🎉SGLang Diffusion x Cache-DiT
- 🎉vLLM-Omni x Cache-DiT
- 🎉Nunchaku x Cache-DiT
- 🎉SD.Next x Cache-DiT
- 🎉stable-diffusion.cpp x Cache-DiT
- 🎉jetson-containers x Cache-DiT
Special thanks to vipshop's Computer Vision AI Team for supporting testing and deployment of this project. We learned and reused codes from: Diffusers, SGLang, vLLM-Omni, Nunchaku, xDiT and TaylorSeer.
@misc{cache-dit@2025,
title={Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism, Quantization and CPU Offload for DiTs.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com, etc.},
year={2025}
}


