Skip to content

Replace rsmi_init with amdsmi_init (via dlsym) in intra_node_comm#3299

Open
adam360x wants to merge 62 commits into
developfrom
users/adam360x/fix-rsmi-init-interposition
Open

Replace rsmi_init with amdsmi_init (via dlsym) in intra_node_comm#3299
adam360x wants to merge 62 commits into
developfrom
users/adam360x/fix-rsmi-init-interposition

Replace all rsmi_* usage with AMDSMI (via dlsym) in intra_node_comm

edd012e
Select commit
Loading
Failed to load commit list.
ROCm Repo Management API / Tests / Tests / Test Inductor / Run pytorch_inductor_1 failed Jun 15, 2026 in 0s

GPUTests.test_var_mean_tile_reduction_True_cuda failed

GPUTests.test_var_mean_tile_reduction_True_cuda failed

Details

GPUTests.test_var_mean_tile_reduction_True_cuda

AssertionError: Tensor-likes are not close!

Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 0.58544921875 at index (0, 3) (up to 1e-05 allowed)
Greatest relative difference: 0.57568359375 at index (0, 1) (up to 0.001 allowed)

The failure occurred for item [2]

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_var_mean_tile_reduction_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
Stack trace
Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 5997, in test_var_mean
    self.common(
  File "/opt/conda/envs/py_3.12/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 781, in check_model_gpu
    check_model(
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 600, in check_model
    assert_equal_fn(
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
    return super().assertEqual(x, y, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4357, in assertEqual
    raise error_metas.pop()[0].to_error(  # type: ignore[index]
AssertionError: Tensor-likes are not close!

Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 0.58544921875 at index (0, 3) (up to 1e-05 allowed)
Greatest relative difference: 0.57568359375 at index (0, 1) (up to 0.001 allowed)

The failure occurred for item [2]

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_torchinductor.py GPUTests.test_var_mean_tile_reduction_True_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0