Skip to content

[CuTeDSL] Lower scalar Float16/BFloat16 load through Uint16+bitcast#3267

Open
cheshire wants to merge 1 commit into
NVIDIA:mainfrom
cheshire:fix/3266-bf16-load
Open

[CuTeDSL] Lower scalar Float16/BFloat16 load through Uint16+bitcast#3267
cheshire wants to merge 1 commit into
NVIDIA:mainfrom
cheshire:fix/3266-bf16-load

Conversation

@cheshire
Copy link
Copy Markdown
Contributor

Fixes #3266

nvvm.load.ext rejects both bf16 and f16 result types at MLIR verification with "Unsupported FP type for ExtLoadOp", even though the underlying PTX op is just ld.b16. In cute.arch.load, route a scalar Float16/BFloat16 request through a Uint16 load + llvm.bitcast back to the requested FP type. Transparent to callers.

The same workaround handles Float16. Vector loads of f16/bf16 are not touched (they go through ir.VectorType and were not verified to hit the same issue).

Added test/python/CuTeDSL/test_arch_load.py exercising both the worked-around 16-bit FP path and the dtypes that nvvm.load.ext accepts directly (Float32 / Uint16 / Uint32 / Int32) as a regression check.

Fixes NVIDIA#3266

`nvvm.load.ext` rejects both `bf16` and `f16` result types at MLIR
verification with "Unsupported FP type for ExtLoadOp", even though the
underlying PTX op is just `ld.b16`. In `cute.arch.load`, route a scalar
Float16/BFloat16 request through a `Uint16` load + `llvm.bitcast` back
to the requested FP type. Transparent to callers.

The same workaround handles Float16 — found while writing the
regression test — so the patch covers both. Vector loads of f16/bf16
are not touched (they go through `ir.VectorType` and were not verified
to hit the same issue).

Added test/python/CuTeDSL/test_arch_load.py exercising both the
worked-around 16-bit FP path and the dtypes that `nvvm.load.ext`
accepts directly (Float32 / Uint16 / Uint32 / Int32) as a regression
check.
@cheshire
Copy link
Copy Markdown
Contributor Author

@grypp WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CuTeDSL] nvvm.load.ext rejects BFloat16: "Unsupported FP type for ExtLoadOp"

1 participant