[release/2.12] Add support for gfx1250 by rraminen · Pull Request #3327 · ROCm/pytorch

rraminen · 2026-06-17T17:43:54Z

Add support for gfx1250

TheRock Validation: https://github.com/ROCm/TheRock/actions/runs/27717422954
Build is passing. Testing is in progress.

…odule + triton pins)

* CK - gfx1250 support (#5) * Enable ROCM_CK_SDPA build * [submodule] composable_kernel and aiter update (pytorch#172592) Summary: update ck to commit ROCm/composable_kernel@fcc9372 update aiter to commit ROCm/aiter@9a469a6 changes of caffe2/aten/src/ATen/CMakeLists.txt and caffe2/caffe2/CMakeLists.txt are adopted from pytorch#161759 updated caffe2/aten/src/ATen/native/transformers/hip/flash_attn/ck/launch_kernel_pt.hpp to match the ck version in https://github.com/ROCm/composable_kernel/blob/292df2719f28cd01464d5d059820684790c101da/include/ck_tile/host/kernel_launch.hpp update aiter fav3 bwd codegen according to changes in ROCm/aiter#1573 update caffe2/aten/src/ATen/native/transformers/hip/flash_attn/ck mha fwd/bwd kernels according to the interfaces in https://github.com/ROCm/composable_kernel/tree/292df2719f28cd01464d5d059820684790c101da/example/ck_tile/01_fmha Differential Revision: D88991877 Pull Request resolved: pytorch#172592 Approved by: https://github.com/alugorey, https://github.com/izaitsevfb * Added MI450 supports and packages * Fix misalinged ck api * Replace aiter with ck for bwd * [ROCm] Bump AOTriton to 0.11.2b (pytorch#174105) Notable new features: * AOTriton 0.11.2b adds gfx1151/1152/1153 support. * Add precompiled AOTriton runtime for ROCM 7.2 * Match the sliding window attention behavior of `_flash_attention_forward/backward` with CUTLASS backend. Bug fixes: * Fixes pytorch#173204. Now all tests in `test/test_varlen_attention.py` are enabled on ROCm Notes: This replaces PR pytorch#173820 and pytorch#173469 Pull Request resolved: pytorch#174105 Approved by: https://github.com/jeffdaily * Fix philox data types for this version of ck * Update CK to use new gfx1250_pytorch branch * Add new gfx1250 compile flags for CK * add --targets to generate and a couple new compile flags * Remove default USE_ROCM_CK_SDPA --------- Co-authored-by: blorange-amd <bo.li2@amd.com> Co-authored-by: Yu Guo <yuguo@meta.com> Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com> * Updated aiter module * Fixed merged error * Fixed additional merged error * Reset USE_ROCM_CK_SDPA config --------- Co-authored-by: LugoReyes, Andy <Andy.LugoReyes@amd.com> Co-authored-by: Yu Guo <yuguo@meta.com> Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

Fix `torch.arange` (and the other range factories sharing this kernel) for very large outputs on ROCm. `torch.arange(N)` with `N >= 2^32` fails on ROCm because `hipLaunchKernel` does not support `gridDim.x * blockDim.x >= 2^32` for the per-thread kernel `aten/src/ATen/native/cuda/RangeFactories.cu` previously used. Depending on the ROCm version the launch returns `hipErrorInvalidConfiguration` or is accepted silently with the kernel never executing, leaving zero-initialized output. Concrete repro: `torch.arange(2 ** 32 + 1, device="cuda", dtype=torch.int32)`. The fix replaces the per-thread launch on the ROCm path with a grid-stride loop that fixes the grid at `sm_count * 4` blocks, so the launch limit is no longer load-bearing for correctness regardless of `N`. The non-ROCm path is untouched. On MI250X the grid-stride kernel matches the per-thread kernel within noise at `N=1024` and is 24-60% faster from `N=1M` up across `int32`, `int64`, and `float32`. On MI300X the grid-stride kernel matches within noise at `N=1024` and `N=1M`, and is 2-5x faster from `N=64M` up across `int32`, `int64`, and `float32`. The 64-bit-indexing test is extended to also cover `N = 2^32 + 1` and `N = 2^33 + 1` on ROCm when memory permits. Pull Request resolved: pytorch#182657 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

* TDM on release/2.11 for bring-up based on careful selection * Triton commit: Upstream fe0c38b5262c0447fed6df0d37e02cb8ea75deb4 -> AMD-ROCm-Internal Triton 250bb5d5b821377f49dc2d83d87ded75b952f0f7; Consequence: Triton TDM support may miss. * Refinement according to reviewers' comments * Added/modified UT cases; NUM_STAGES issue of ineffectiveness * A couple of changes to related UTs * Got rid of configs like `waves_per_cu=2`

…code

- Need to turn MSLK on for mi300 and mi350 - Need to turn CK off for gfx1250 ## Motivation  ## Technical Details  ## Test Plan  ## Test Result  ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

# Bump to AOTriton 0.12.50tp Notable new features: Enable gfx1250 ## Features from AOTriton 0.12b Notable new features: * **BREAKING** Varlen LSE tensor shape changes to (H, Total_seqlen) * Support head_dim != head_dim_v * Support `use_deterministic_algorithims` * Support seqused_k in test/test_varlen_attention.py * gfx1100 and gfx1151 promoted out of experimental * Partial FAv3 support on gfx950 Bug Fixes: * GQA kernel failed to read bias tensor with the right offset. Known Issues * gfx950's Triton kernel has problem handling hdim=16's fwd, in addition to hdim=48/80's bwd. * Disables gfx90a's CK SDPA support due to GPU Segfault. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Prachi Gupta <pracgupt@amd.com>

rocm-repo-management-api · 2026-06-17T17:53:59Z

Jenkins build for aeb64a7497d08b2da400801c9340834bd6bde3f1 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

[5809/8035] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/error_report.cpp.o
[5810/8035] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o
[5811/8035] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypeProperties.cpp.o
[5812/8035] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapTransforms.cpp.o
[5813/8035] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DENABLE_IPC_FABRIC -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAS_ROCTRACER -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DKINETO_NAMESPACE=libkineto -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_VERSION=70204 -DTORCH_HIP_VERSION=702 -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_LAYERNORM_FAST_RECIPROCAL -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -DXNN_LOG_LEVEL=0 -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/TH -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/build/caffe2/../aten/src -I/var/lib/jenkins/pytorch/torch/csrc -I/var/lib/jenkins/pytorch/torch/headeronly -I/var/lib/jenkins/pytorch/third_party/miniz-3.0.2 -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src -I/var/lib/jenkins/pytorch/third_party/cpp-httplib -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/third_party/FXdiv/include -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/third_party/pthreadpool/include -I/var/lib/jenkins/pytorch/third_party/cpuinfo/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/var/lib/jenkins/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/include -I/var/lib/jenkins/pytorch/third_party/NNPACK/include -I/var/lib/jenkins/pytorch/third_party/FP16/include -I/var/lib/jenkins/pytorch/third_party/tensorpipe -I/var/lib/jenkins/pytorch/build/third_party/tensorpipe -I/var/lib/jenkins/pytorch/third_party/tensorpipe/third_party/libnop/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/build/third_party/ideep/mkl-dnn/include -I/var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/src/../include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/third_party/flatbuffers/include -isystem /opt/rocm-7.2.4/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /opt/rocm/include -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /var/lib/jenkins/pytorch/third_party/concurrentqueue -isystem /var/lib/jenkins/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_MSLK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++20 -fPIC -fdiagnostics-color=always -DMKL_HAS_SBGEMM -DMKL_HAS_SHGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wunused -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wredundant-move -Wno-interference-size -Wno-maybe-uninitialized -fvisibility=hidden -pthread -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o -c /var/lib/jenkins/pytorch/aten/src/ATen/Context.cpp
/var/lib/jenkins/pytorch/aten/src/ATen/Context.cpp: In static member function ‘static bool at::Context::ckSupported()’:
/var/lib/jenkins/pytorch/aten/src/ATen/Context.cpp:508:1: error: version control conflict marker in file
  508 | <<<<<<< xinyazhang/backport-aotriton-0.12b-2.12_gfx1250
      | ^~~~~~~

Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

rocm-repo-management-api · 2026-06-17T18:53:17Z

Jenkins build for bbecc5657577e15f0c0aa057daf34b4e4be41c31 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

rocm-repo-management-api · 2026-06-17T20:36:38Z

Jenkins build for 04b54055439beb8a156f244c3fd3cdb9e31a1d3b commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

This PR is to address the reviewed comments on PR #3307

rocm-repo-management-api · 2026-06-17T21:36:30Z

Jenkins build for 43a62ac6cc17a57487826c1b7b0f6e7cf96a43c1 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Fix typo

rocm-repo-management-api · 2026-06-18T00:06:47Z

Jenkins build for 43a62ac6cc17a57487826c1b7b0f6e7cf96a43c1 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

glen-amd · 2026-06-18T15:34:42Z

 # generate a list of kernels, but not actually emit files at config stage
 execute_process(
-  COMMAND python3 ${CMAKE_SOURCE_DIR}/third_party/composable_kernel/example/ck_tile/01_fmha/generate.py
+  COMMAND python3 ${CMAKE_SOURCE_DIR}/third_party/composable_kernel/example/ck_tile/01_fmha/generate.py --targets gfx1250


Here, --targets gfx1250 is appended to every generate.py invocation, unconditionally.

This restricts CK FMHA blob generation to gfx1250 for all builds, including pure gfx942/gfx950 builds. A PYTORCH_ROCM_ARCH=gfx942 build will now emit only gfx1250 kernels and lose its own FMHA code objects.

Suggested fix: derive --targets from PYTORCH_ROCM_ARCH (filtered to CK-supported archs), or drop the flag entirely and keep the generator's default multi-target behavior.

CK is turned off for gfx1250, not a priority at the moment. We can probably just drop this.

glen-amd · 2026-06-18T15:43:15Z

 constexpr size_t kSmallSize = 1048576;
 // allocations between 1 and 10 MiB may use kLargeBuffer
-constexpr size_t kMinLargeAlloc = 10485760;
+#if defined(USE_ROCM) && defined(__gfx1250__)


This is host-side allocator code. __gfx1250__ is a device-compilation predefine. As a result, this #if block never compiles?

glen-amd · 2026-06-18T15:46:29Z

    auto persistent_counter = mk_atomictensor(is_causal ? atomic_counter.data_ptr<int32_t>() : nullptr);
    hipError_t err; // TODO: Error handling
-    if constexpr (AOTRITON_ALWAYS_V3_API) {  // Better readability than nesting ifdef
-#if AOTRITON_V3_API  // if constexpr does not stop errors from undefined functions


This removal of code block below made a AOTriton v3 hard-switch and removed the v2 fallback without a build guard. Would this bring portability risk?

Hi @xinyazhang, could you please help me address this review w.r.t cherry-pick of aeb64a7 ?

This change is already in upstream. pytorch@5e3cb3e

ultimately is 0.12.50tp is 0.12b with gfx1250 support (yes it's ABI compatible). The related PR is also 0.12b's PR+version bump 0.12b->0.12.50tp.

glen-amd · 2026-06-18T15:50:46Z

 #if ROCM_VERSION >= 70000
-  TORCH_CHECK_NOT_IMPLEMENTED(at::detail::getCUDAHooks().isGPUArch({"gfx950"}),
-              "Block-wise scaling for Float8_e8m0fnu is only supported on gfx950");
+  TORCH_CHECK_NOT_IMPLEMENTED(at::detail::getCUDAHooks().isGPUArch({"gfx950", "gfx1250"}),


Above _scaled_mm_allowed_device() (line ~82) gates gfx1250 at >= 70200. So So on ROCm 7.0/7.1 the device is rejected by _scaled_mm_allowed_device yet these inner checks would have admitted it.

How about nesting #if ROCM_VERSION >= 70200 inside each isGPUArch({...})?

glen-amd · 2026-06-18T15:53:25Z

    try {
        if (at::cuda::device_count() > 0) {
-            g_hipSparseLtSupported = at::detail::getCUDAHooks().isGPUArch({"gfx950", "gfx942"}, 0);
+            g_hipSparseLtSupported = at::detail::getCUDAHooks().isGPUArch({"gfx950", "gfx942", "gfx1250"}, 0);


Can we confirm whether hipSparseLT requires ROCm 7.2+?
gfx1250 is advertised unconditionally here, which might fail deeper.

Yes, hipsparselt actually requires ROCm >=7.12. PR is in progress pytorch#178737

glen-amd · 2026-06-18T16:00:40Z

-        CK_USE_GFX94
+        #CK_USE_FNUZ_FP8
+        #CK_USE_GFX94
+        CK_USE_GFX1250


Here and below change the CK SDPA compile definitions globally.
Are CK/AITER artifacts for GFX1250 actually ready and validated?

glen-amd · 2026-06-18T16:04:06Z


+    # composable_kernel has no gfx1250 support, so its CK GEMM/SDPA kernels fail
+    # to compile for that arch. 
+    if("gfx1250" IN_LIST PYTORCH_ROCM_ARCH)


This disables both USE_ROCM_CK_GEMM and USE_ROCM_CK_SDPA whenever PYTORCH_ROCM_ARCH contains gfx1250.
Would this break mixed-arch builds such as gfx942;gfx950;gfx1250?

Yes, this is highly problematic for multi-arch builds. We could follow the same approach like we do for other sub-components of PyTorch build by using HIP_CLANG_FLAGS temporary override?

This was solved on release/2.11 in #3346 (merged). That PR moves the logic out of Dependencies.cmake and into aten/src/ATen/CMakeLists.txt:

pytorch/aten/src/ATen/CMakeLists.txt

Lines 203 to 216 in 712584b

# composable_kernel lacks gfx1250 support. CK GEMM/SDPA are otherwise built for

# every arch except gfx1250 (the --offload-arch filtering below). If gfx1250 is

# the ONLY arch there is no supported arch left to build CK for, so disable both

# entirely here. caffe2_update_option writes the cache, so this is honored by the

# conditional CK GEMM/SDPA defines and links in caffe2/CMakeLists.txt.

if(USE_ROCM AND "gfx1250" IN_LIST PYTORCH_ROCM_ARCH)

set(_ck_supported_archs ${PYTORCH_ROCM_ARCH})

list(REMOVE_ITEM _ck_supported_archs gfx1250)

if("${_ck_supported_archs}" STREQUAL "")

message(WARNING "gfx1250 is the only arch in PYTORCH_ROCM_ARCH: disabling USE_ROCM_CK_GEMM and USE_ROCM_CK_SDPA (composable_kernel lacks gfx1250 support)")

caffe2_update_option(USE_ROCM_CK_GEMM OFF)

caffe2_update_option(USE_ROCM_CK_SDPA OFF)

endif()

endif()

glen-amd · 2026-06-18T16:08:09Z

  // ifdef USE_ROCM_CK_GEMM is required since ROCm systems w/o CK should not call ck path.
 #if defined(USE_ROCM_CK_GEMM)
-  if (at::globalContext().rocmAllowGroupGemmCk() && at::detail::getCUDAHooks().isGPUArch({"gfx942", "gfx950", "gfx90a"})) {
+  if (at::globalContext().rocmAllowGroupGemmCk() && at::detail::getCUDAHooks().isGPUArch({"gfx942", "gfx950", "gfx90a", "gfx1250"})) {


Is it true that the existing CK grouped GEMM path is the Wave64/MFMA/XDL path used gfx90a/gfx942/gfx950? If so, because gfx1250 is Wave32 and WMMA/SWMMAC-oriented, it may not be routed into this path by arch-name allowlisting.

glen-amd · 2026-06-18T16:10:13Z

  auto *from = reinterpret_cast<const vec_t *>(base_ptr);
-#if defined(USE_ROCM) && defined(__gfx942__)
+  // Extend the non-temporal load optimization to GFX1250.
+#if defined(USE_ROCM) && (defined(__gfx942__) || defined(__gfx1250__))


Simply extending to gfx1250, would this be another Wave64-tuned path being applied to Wave32 hardware?

glen-amd · 2026-06-18T16:13:16Z

+            # CDNA4 (gfx950) 160KB, and CDNA5 (gfx1250) 320KB.
+            if device_props.gcnArchName == "gfx950":
+                max_shared_mem = 160 * 1024
+            elif device_props.gcnArchName == "gfx1250":


gcnArchName can include feature suffixes such as gfx1250:sramecc+:xnack-. Would this exact comparison lead to unexpected fallback?

jithunnair-amd · 2026-06-18T22:36:00Z

These changes are unnecessary unless we know for certain any build workflows that would use it. TheRock build workflows don't.

jithunnair-amd · 2026-06-18T22:36:05Z

These changes are unnecessary unless we know for certain any build workflows that would use it. TheRock build workflows don't.

jithunnair-amd · 2026-06-18T22:43:42Z

 __device__ inline __hip_bfloat162 preview_unsafeAtomicAdd(__hip_bfloat162* address, __hip_bfloat162 value) {
-#if (defined(__gfx942__)) && \
+// `__gfx1250__`-specific `s_wait_loadcnt(0)` path for committed store already there
+#if (defined(__gfx942__) || defined(__gfx1250__)) && \


Does this change matter now, if the outer condition is #if ROCM_VERSION < 60400?

Addressed in #3347

rraminen and others added 7 commits June 17, 2026 09:16

Cherry-pick bdbcbea8dbf09fb95685d499cd6b1de1e04fe4b0 (exclude CK subm…

632d652

…odule + triton pins)

Bug fix - Remove drop_seed, drop_offset as there are not used in the …

07ec88f

…code

rraminen requested a review from jeffdaily as a code owner June 17, 2026 17:43

Fixes previous PR: [ROCm] Bump AOTriton to 0.12.50tp (#3328)

55719a5

Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

Turn off MSLK's CK Kernels for gfx1250 (#3329)

bbecc56

[release/2.12_gfx1250] Support for gfx1250 - bug fixes (#3330)

04b5405

This PR is to address the reviewed comments on PR #3307

This was referenced Jun 17, 2026

[releease/2.12] Support for gfx1250 #3322

Closed

[release/2.11] Add support for gfx1250 #3307

Closed

naromero77amd reviewed Jun 17, 2026

View reviewed changes

Comment thread test/inductor/test_max_autotune.py Outdated

[release/2.12_gfx1250] Fix typo (#3331)

43a62ac

Fix typo

glen-amd reviewed Jun 18, 2026

View reviewed changes

jithunnair-amd reviewed Jun 18, 2026

View reviewed changes

pragupta changed the title ~~[release/2.12] Support for gfx1250~~ [release/2.12] Add support for gfx1250 Jun 19, 2026

rahulc-gh mentioned this pull request Jun 25, 2026

[issue] gfx1250 pytorch build temporary env vars ROCm/TheRock#5833

Open

	# composable_kernel lacks gfx1250 support. CK GEMM/SDPA are otherwise built for
	# every arch except gfx1250 (the --offload-arch filtering below). If gfx1250 is
	# the ONLY arch there is no supported arch left to build CK for, so disable both
	# entirely here. caffe2_update_option writes the cache, so this is honored by the
	# conditional CK GEMM/SDPA defines and links in caffe2/CMakeLists.txt.
	if(USE_ROCM AND "gfx1250" IN_LIST PYTORCH_ROCM_ARCH)
	set(_ck_supported_archs ${PYTORCH_ROCM_ARCH})
	list(REMOVE_ITEM _ck_supported_archs gfx1250)
	if("${_ck_supported_archs}" STREQUAL "")
	message(WARNING "gfx1250 is the only arch in PYTORCH_ROCM_ARCH: disabling USE_ROCM_CK_GEMM and USE_ROCM_CK_SDPA (composable_kernel lacks gfx1250 support)")
	caffe2_update_option(USE_ROCM_CK_GEMM OFF)
	caffe2_update_option(USE_ROCM_CK_SDPA OFF)
	endif()
	endif()

Uh oh!

Conversation

rraminen commented Jun 17, 2026 • edited by pragupta Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rraminen Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rraminen commented Jun 17, 2026 •

edited by pragupta

Loading

rocm-repo-management-api Bot commented Jun 17, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 17, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 17, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 17, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 18, 2026 •

edited

Loading

rraminen Jun 19, 2026 •

edited

Loading