Skip to content

Enable tcgen05 blockscaled ops on Thor SM110#3283

Open
xiangg-nv wants to merge 1 commit into
NVIDIA:mainfrom
xiangg-nv:feature/cutedsl-thor-sm110
Open

Enable tcgen05 blockscaled ops on Thor SM110#3283
xiangg-nv wants to merge 1 commit into
NVIDIA:mainfrom
xiangg-nv:feature/cutedsl-thor-sm110

Conversation

@xiangg-nv
Copy link
Copy Markdown

@xiangg-nv xiangg-nv commented May 28, 2026

Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. (Currently use the patch to WR)

The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110.

Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops.

Validation:

  • git diff --check

  • python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py

  • grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS

  • Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS

Copy link
Copy Markdown
Collaborator

@brandon-yujie-sun brandon-yujie-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing the codes! Would it be possible to also add some test cases?

Comment thread python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py Outdated
Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110.

Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops.

Validation:

- git diff --check

- python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py

- DKG grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS

- Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS
@xiangg-nv xiangg-nv force-pushed the feature/cutedsl-thor-sm110 branch from e19678b to da87ebf Compare May 29, 2026 10:22
@xiangg-nv
Copy link
Copy Markdown
Author

Thanks for contributing the codes! Would it be possible to also add some test cases?

Not sure which kind of test case should I add, it seems that we don't have any SM110 test cases, can you help to resolve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants