GitHub - vipshop/cache-dit: A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization and CPU Offload for DiTs

🤗Why Cache-DiT❓❓Cache-DiT is built on top of the 🤗Diffusers library and now supports nearly ALL DiTs from Diffusers. It provides hybrid cache acceleration (DBCache, TaylorSeer, SCM, etc.) and comprehensive parallelism optimizations, including Context Parallelism, Tensor Parallelism, hybrid 2D or 3D parallelism, and dedicated extra parallelism support for Text Encoder, VAE, and ControlNet.

Cache-DiT is compatible with compilation, CPU Offloading, and quantization, fully integrates with SGLang Diffusion, vLLM-Omni, TensorRT-LLM, ComfyUI, and runs natively on NVIDIA GPUs, Ascend NPUs and AMD GPUs. Cache-DiT is fast, easy to use, and flexible for various DiTs (online docs at 📘cache-dit.io). Cache-DiT's technical report is available at 🎉Cache-DiT: A Unified PyTorch-Native Inference Engine for Diffusion Transformers.

⚡️9x speedup by Cache-DiT with Cache, Context Parallelism and Compilation

📋Supported DiT Models

Cache-DiT supports 40+ DiT pipeline families (120+ Variants) from 🤗Diffusers, covering the vast majority of DiT-based pipelines. For full support matrix and detailed usage, please refer to our documentation at 📘cache-dit.io.

Modality	Pipeline Series	Transformer	Variants	C/P/Q/OF
Image	FLUX	`FluxTransformer2DModel`	10+	✔️/✔️/✔️/✔️
Image	FLUX.2	`Flux2Transformer2DModel`	1	✔️/✔️/✔️/✔️
Image	FLUX.2 Klein	`Flux2Transformer2DModel`	3	✔️/✔️/✔️/✔️
Image	ERNIE-Image	`ErnieImageTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	Qwen-Image	`QwenImageTransformer2DModel`	9+	✔️/✔️/✔️/✔️
Image	Z-Image	`ZImageTransformer2DModel`	6	✔️/✔️/✔️/✔️
Image	Joy-Image	`JoyImageEditTransformer3DModel`	1	✔️/✔️/✔️/✔️
Image	Boogu-Image	`BooguImageTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	Krea-2	`Krea2Transformer2DModel`	1	✔️/✔️/✔️/✔️
Image	Bria-Fibo	`BriaFiboTransformer2DModel`	2	✔️/✔️/✔️/✔️
Image	GLM-Image	`GlmImageTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	LongCat-Image	`LongCatImageTransformer2DModel`	2	✔️/✔️/✔️/✔️
Image	Ovis-Image	`OvisImageTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	SD3	`SD3Transformer2DModel`	3	✔️/✖️/✔️/✔️
Image	PixArt-Alpha	`PixArtTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	PixArt-Sigma	`PixArtTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	Sana (image)	`SanaTransformer2DModel`	4	✔️/✖️/✔️/✔️
Image	DiT	`DiTTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	HunyuanDiT	`HunyuanDiT2DModel`	1	✔️/✔️/✔️/✔️
Image	HunyuanDiT-PAG	`HunyuanDiT2DModel`	1	✔️/✔️/✔️/✔️
Image	AuraFlow	`AuraFlowTransformer2DModel`	1	✔️/✖️/✔️/✔️
Image	CogView4	`CogView4Transformer2DModel`	2	✔️/✔️/✔️/✔️
Image	CogView3Plus	`CogView3PlusTransformer2DModel`	1	✔️/✔️/✔️/✔️
Image	Hunyuan-Image	`HunyuanImageTransformer2DModel`	2	✔️/✔️/✔️/✔️
Image	HiDream	`HiDreamImageTransformer2DModel`	1	✔️/✖️/✔️/✔️
Image	Bria	`BriaTransformer2DModel`	1	✔️/✖️/✔️/✔️
Image	Chroma	`ChromaTransformer2DModel`	3	✔️/✔️/✔️/✔️
Image	PRX	`PRXTransformer2DModel`	2	✔️/✖️/✔️/✔️
Image	OmniGen	`OmniGenTransformer2DModel`	1	✔️/✖️/✔️/✔️
Image	Lumina	`LuminaNextDiT2DModel`	2	✔️/✖️/✔️/✔️
Image	Lumina 2	`Lumina2Transformer2DModel`	2	✔️/✖️/✔️/✔️
Image	VisualCloze	`FluxTransformer2DModel`	2	✔️/✔️/✔️/✔️
Image	Amused	`UVit2DModel`	1	✔️/✖️/✔️/✔️
Video	CogVideoX	`CogVideoXTransformer3DModel`	4	✔️/✔️/✔️/✔️
Video	Wan (T2V/I2V)	`WanTransformer3DModel`	3	✔️/✔️/✔️/✔️
Video	Wan-VACE	`WanVACETransformer3DModel`	1	✔️/✔️/✔️/✔️
Video	HunyuanVideo	`HunyuanVideoTransformer3DModel`	4	✔️/✔️/✔️/✔️
Video	HunyuanVideo 1.5	`HunyuanVideo15Transformer3DModel`	2	✔️/✖️/✔️/✔️
Video	Mochi	`MochiTransformer3DModel`	1	✔️/✔️/✔️/✔️
Video	Allegro	`AllegroTransformer3DModel`	1	✔️/✖️/✔️/✔️
Video	EasyAnimate	`EasyAnimateTransformer3DModel`	3	✔️/✖️/✔️/✔️
Video	ConsisID	`ConsisIDTransformer3DModel`	1	✔️/✔️/✔️/✔️
Video	Cosmos	`CosmosTransformer3DModel`	7+	✔️/✖️/✔️/✔️
Video	LTX	`LTXVideoTransformer3DModel`	5	✔️/✔️/✔️/✔️
Video	LTX2	`LTX2VideoTransformer3DModel`	6	✔️/✔️/✔️/✔️
Video	Helios	`HeliosTransformer3DModel`	2	✔️/✔️/✔️/✔️
Video	ChronoEdit	`ChronoEditTransformer3DModel`	1	✔️/✔️/✔️/✔️
Video	SkyReelsV2	`SkyReelsV2Transformer3DModel`	5	✔️/✔️/✔️/✔️
Video	Kandinsky5 (T2V/I2V)	`Kandinsky5Transformer3DModel`	4	✔️/✔️/✔️/✔️
Audio	StableAudio	`StableAudioDiTModel`	1	✔️/✖️/✔️/✔️
3D/Other	Shap-E	`PriorTransformer`	1	✔️/✖️/✔️/✔️
Other	LucyEdit	`WanTransformer3DModel`	1	✔️/✔️/✔️/✔️

C: Hybrid Cache (DBCache + Calibrator: TaylorSeer/DMD/FoCa/SCM); P: Parallelism (Ulysses/Ring/USP/TP
TE-P/VAE-P/2D-P/3D-P); Q: Quantization (W8A8, W4A4); OF: Bucket-style Layerwise Offload (~0 overhead)

🤖Agentic Workflows

Cache-DiT provides a model-integration SKILL to help users integrate new DiT pipelines into Cache-DiT, including Cache, CP, TP, TE-P, VAE-P and carefully designed test cases. Users can use it with Coding Agents, e.g, GitHub Copilot, Claude Code, Open Code.

Note

Please note that quantization and layerwise offload in Cache-DiT are generally supported for nn.Module, thus no extra integration is needed for new DiT pipelines or transformers.

🌐Community Integration

Special thanks to vipshop's Computer Vision AI Team for supporting testing and deployment of this project. We learned and reused codes from: Diffusers, SGLang, vLLM-Omni, Nunchaku, xDiT and TaylorSeer.

©️Citations

@misc{cache-dit@2025,
  title={Cache-DiT: A PyTorch-native Inference Engine with Cache, Parallelism, Quantization and CPU Offload for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com, etc.},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,154 Commits
.copilot		.copilot
.github/workflows		.github/workflows
assets		assets
bench		bench
csrc		csrc
docs		docs
examples		examples
src/cache_dit		src/cache_dit
tests		tests
tools		tools
.clang-format		.clang-format
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
.style.yapf		.style.yapf
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
collect_env.py		collect_env.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization and CPU Offload for DiTs

📋Supported DiT Models

🤖Agentic Workflows

🌐Community Integration

©️Acknowledgements

©️Citations

About

Uh oh!

Releases 97

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

⚡️🎉A PyTorch-native Inference Engine with Cache, Parallelism, Quantization and CPU Offload for DiTs

📋Supported DiT Models

🤖Agentic Workflows

🌐Community Integration

©️Acknowledgements

©️Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 97

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚡️🎉A PyTorch-native Inference Engine with Cache,
Parallelism, Quantization and CPU Offload for DiTs

Packages