Skip to content

[CUDA] JIT-compile qmm_naive#3576

Open
zcbenz wants to merge 1 commit into
ml-explore:mainfrom
zcbenz:cutlass-jit
Open

[CUDA] JIT-compile qmm_naive#3576
zcbenz wants to merge 1 commit into
ml-explore:mainfrom
zcbenz:cutlass-jit

Conversation

@zcbenz
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz commented May 21, 2026

Bundle the headers of CUTLASS and JIT-compile the qmm_naive kernels, which reduces the binary size (#3567) and is required for meeting the size limit of PyPI. Most of the changes are moving code to backend/cuda/device and reducing uses of advanced C++ features to make NVRTC happy.

An unfortunate side effect is test_quantized.py now takes half an hour to run, I will try if I can make some sub-tests run in parallel or do some proper caching in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant