Skip to content

Fix cross-execution-space error: remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice#3286

Open
alexngUNC wants to merge 1 commit into
NVIDIA:mainfrom
alexngUNC:fix/memsetdevice-host-only
Open

Fix cross-execution-space error: remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice#3286
alexngUNC wants to merge 1 commit into
NVIDIA:mainfrom
alexngUNC:fix/memsetdevice-host-only

Conversation

@alexngUNC
Copy link
Copy Markdown

Summary:

  • Remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice in include/cutlass/cuda_host_adapter.hpp.
  • memsetDevice calls the pure virtual host method memsetDeviceImpl, which implementations use to dispatch to cudaMemsetAsync / cuMemsetD*Async. It must remain host-only.
  • Fixes compile failures under --Werror cross-execution-space-call when memsetDevice is instantiated from __host__ __device__ code paths.

Context:

  • memsetDevice was host-only in 3.5.0 (#1411). In 3.5.1 (#1623), CUTLASS_HOST_DEVICE was added to copy/move/empty/size so CudaHostAdapter could be used in __host__ __device__ code. memsetDevice received the same macro by mistake even though it dispatches to virtual host APIs (memsetDeviceImpl → cudaMemsetAsync / cuMemsetD*Async).
  • Callers (zero_workspace, fill_workspace in workspace.h) are host-only.

Error observed when compiling with --Werror cross-execution-space-call:

error: calling a `__host__` function("memsetDeviceImpl")
from a `__host__ __device__` function("memsetDevice") is not allowed

Fixes downstream build failures (e.g. PyTorch rowwise FP8 CUTLASS on Windows with --Werror cross-execution-space-call).

cc @nkhasbag-nv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant