Skip to content

Replace rsmi_init with amdsmi_init (via dlsym) in intra_node_comm#3299

Open
adam360x wants to merge 1 commit into
developfrom
users/adam360x/fix-rsmi-init-interposition
Open

Replace rsmi_init with amdsmi_init (via dlsym) in intra_node_comm#3299
adam360x wants to merge 1 commit into
developfrom
users/adam360x/fix-rsmi-init-interposition

Conversation

@adam360x

@adam360x adam360x commented Jun 14, 2026

Copy link
Copy Markdown

Summary

  • Replace direct rsmi_init(0) call with dlsym(RTLD_DEFAULT, "amdsmi_init") in intra_node_comm.cpp
  • Eliminates dangling rsmi_init undefined symbol from libtorch_hip.so
  • No cmake or link-time dependency changes

Problem

libtorch_hip.so calls rsmi_init() but does not list librocm_smi64.so as a NEEDED dependency — rsmi_init is left as an undefined symbol resolved at runtime. libamd_smi.so exports rsmi_init for backward compatibility, so when it gets loaded with RTLD_GLOBAL (via the ROCm SDK dependency chain), the dynamic linker interposed that rsmi_init over libamd_smi.so's own internal copy. This left AMDSMI's RSMI singleton uninitialized, resulting in zero devices or sentinel values (e.g. gfxffffffffffffffff) when users called amdsmi_init() after importing torch.

Reproducer: import torch before amdsmi_init() on navi31/navi48 with pip-installed amdsmi and ROCm 7.12/7.13 nightlies.

Root cause verification

Confirmed via isolated ctypes tests on a Radeon RX 7900 XT (gfx1100):

# BROKEN: system librocm_smi64 loaded globally poisons nightly libamd_smi
ctypes.CDLL("librocm_smi64.so.1", mode=RTLD_GLOBAL)
nightly = ctypes.CDLL("libamd_smi.so.26")
nightly.amdsmi_init(2)  # returns SUCCESS
# amdsmi_get_socket_handles -> 0 sockets (broken)

# FIXED: with RTLD_LOCAL, no interposition
ctypes.CDLL("librocm_smi64.so.1", mode=RTLD_LOCAL)
nightly = ctypes.CDLL("libamd_smi.so.26")
nightly.amdsmi_init(2)  # returns SUCCESS
# amdsmi_get_socket_handles -> 2 sockets (correct)

Fix

Use dlsym(RTLD_DEFAULT, "amdsmi_init") to call amdsmi_init at runtime instead of linking rsmi_init directly. This:

  • Removes the rsmi_init undefined symbol from libtorch_hip.so, eliminating the interposition vector
  • Avoids any link-time NEEDED dependency on libamd_smi.so
  • Gracefully fails if libamd_smi.so is not loaded

rsmi_is_P2P_accessible remains unchanged — amdsmi_init initializes the RSMI layer internally, so existing rsmi_* query calls continue to work.

Test results

Build 4: 103,130 passed, 3 failed, 30,322 skipped. All 3 failures (test_ddp_apply_optim_in_backward_ignored_params, test_host_memory_stats, test_cuda_graph_tensor_item_not_allowed) are pre-existing on develop (develop build #25 has 31 failures including the same tests).

@rocm-repo-management-api

rocm-repo-management-api Bot commented Jun 14, 2026

Copy link
Copy Markdown

Jenkins build for df95aa5c0d12e2db9f9f91644fc4e3e323910e5f commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

      |             ^~~~~~~~~~~~~~~~~~
[7680/8176] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/CUDASymmetricMemoryUtils.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
[7681/8176] Building CXX object caffe2/CMakeFiles/MaybeOwned_test.dir/__/aten/src/ATen/test/MaybeOwned_test.cpp.o
[7682/8176] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASH_NAMESPACE=pytorch_flash -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DHIPBLASLT_USE_ROCROLLER -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_VERSION=70204 -DTORCH_CUDA_BUILD_MAIN_LIB -DTORCH_HIP_VERSION=702 -DUNFUSE_FMA -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_LAYERNORM_FAST_RECIPROCAL -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_PROF_API=1 -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -Dtorch_hip_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/var/lib/jenkins/pytorch/nlohmann -I/var/lib/jenkins/pytorch/moodycamel -I/var/lib/jenkins/pytorch/aten/src/THH -I/var/lib/jenkins/pytorch/third_party/mslk/include -I/var/lib/jenkins/pytorch/aten/src/ATen/hip -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/library/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/example/ck_tile/01_fmha -I/var/lib/jenkins/pytorch/build/caffe2/aten/src/ATen/composable_kernel -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/aiter/csrc/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/torch/include -I/var/lib/jenkins/pytorch/c10/hip/../.. -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/build/third_party/gloo/hip -isystem /opt/rocm-7.2.4/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /opt/rocm/include -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /var/lib/jenkins/pytorch/third_party/concurrentqueue -isystem /opt/rocm-7.2.4/include/hiprand -isystem /opt/rocm-7.2.4/include/rocrand -isystem /opt/rocm/magma/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_MSLK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -fdiagnostics-color=always -DMKL_HAS_SBGEMM -DMKL_HAS_SHGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wunused -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wredundant-move -Wno-interference-size -Wno-maybe-uninitialized -fvisibility=hidden -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=702 -Wno-shift-count-negative -Wno-shift-count-overflow -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -DHIPBLAS_V2 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIPBLASLT_OUTER_VEC -DUSE_ROCM_CK_GEMM -DHIP_VERSION=7 -Wno-duplicate-decl-specifier -DUSE_MIOPEN -MD -MT caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp.o -MF caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp.o.d -o caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp.o -c /var/lib/jenkins/pytorch/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
/var/lib/jenkins/pytorch/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp: In function ‘c10d::intra_node_comm::NvlMesh c10d::intra_node_comm::getNvlMesh(const std::vector<int>&)’:
/var/lib/jenkins/pytorch/torch/csrc/distributed/c10d/symm_mem/intra_node_comm.cpp:57:7: error: ‘amdsmi_processor_type_t’ was not declared in this scope; did you mean ‘amdsmi_processor_handle’?
   57 |       amdsmi_processor_type_t ptype;

@adam360x adam360x force-pushed the users/adam360x/fix-rsmi-init-interposition branch from df95aa5 to 3315c81 Compare June 14, 2026 17:07
@rocm-repo-management-api

rocm-repo-management-api Bot commented Jun 14, 2026

Copy link
Copy Markdown

Jenkins build for 3315c812c921428a48ba172144c7a356a7f1d411 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Detected error during Pytorch building:

[8074/8176] Linking CXX shared library lib/libtorch.so
Warning: Unused direct dependencies:
	/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so
	/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so
[8075/8176] Linking CXX executable bin/accelerator_graph_test
FAILED: bin/accelerator_graph_test 
: && /opt/cache/bin/c++ -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_MSLK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--dependency-file=caffe2/CMakeFiles/accelerator_graph_test.dir/link.d -Wl,--no-as-needed caffe2/CMakeFiles/accelerator_graph_test.dir/__/aten/src/ATen/test/accelerator_graph_test.cpp.o -o bin/accelerator_graph_test -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/var/lib/jenkins/pytorch/build/lib:/opt/rocm-7.2.4/lib:/opt/rocm/lib:  lib/libgtest_main.a  lib/libgtest.a  lib/libgmock.a  -lstdc++  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  /opt/conda/envs/py_3.12/lib/libmkl_intel_lp64.a  /opt/conda/envs/py_3.12/lib/libmkl_gnu_thread.a  /opt/conda/envs/py_3.12/lib/libmkl_core.a  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.a  -lm  /usr/lib/x86_64-linux-gnu/libdl.a  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so" -Wl,--as-needed  lib/libc10_hip.so  lib/libc10.so  /opt/rocm-7.2.4/lib/libMIOpen.so.1.0.70204  /opt/rocm/lib/libhiprtc.so.7.2.70204  -ldl  /opt/rocm-7.2.4/lib/libhipblas.so.3.2.70204  /opt/rocm-7.2.4/lib/libhipfft.so.0.1.70204  /opt/rocm-7.2.4/lib/libhiprand.so.1.1.70204  /opt/rocm-7.2.4/lib/librocrand.so.1.1.70204  /opt/rocm-7.2.4/lib/libhipsparse.so.4.2.0.70204  /opt/rocm-7.2.4/lib/libhipsolver.so.1.0.70204  /opt/rocm-7.2.4/lib/librocsolver.so.0.7.70204  /opt/rocm-7.2.4/lib/librocblas.so.5.2.70204  /opt/rocm/lib/libhipblaslt.so.1.2.70204  /opt/rocm/lib/libamdhip64.so.7.2.70204  /opt/rocm-7.2.4/lib/libhipsparselt.so.0.2.70204  lib/libgtest.a && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/accelerator_graph_test && :
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `amdsmi_init'
collect2: error: ld returned 1 exit status
[8076/8176] Linking CXX executable bin/NamedTensor_test
FAILED: bin/NamedTensor_test 

@adam360x adam360x force-pushed the users/adam360x/fix-rsmi-init-interposition branch from 3315c81 to 9cf4004 Compare June 14, 2026 18:43
@rocm-repo-management-api

rocm-repo-management-api Bot commented Jun 14, 2026

Copy link
Copy Markdown

Jenkins build for 9cf4004684351e9dad64cc3b329d8d7355cb710e commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@adam360x adam360x force-pushed the users/adam360x/fix-rsmi-init-interposition branch from 9cf4004 to 7094c1a Compare June 15, 2026 02:05
@rocm-repo-management-api

rocm-repo-management-api Bot commented Jun 15, 2026

Copy link
Copy Markdown

Jenkins build for 7094c1a6333859e38c13263e84c6a5dcd11320e2 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@adam360x

Copy link
Copy Markdown
Author

Failing tests seem unrelated to PR

@adam360x adam360x changed the title Replace legacy rsmi_* calls with AMDSMI API in intra_node_comm Replace rsmi_init with amdsmi_init (via dlsym) in intra_node_comm Jun 15, 2026
libtorch_hip.so referenced rsmi_init and rsmi_is_P2P_accessible as
undefined symbols without listing librocm_smi64.so as a NEEDED
dependency.  When libamd_smi.so (which also exports these symbols for
backward compatibility) was loaded with RTLD_GLOBAL, the dynamic
linker interposed them over libamd_smi.so's internal copies.  This
caused AMDSMI's RSMI singleton to remain uninitialized, resulting in
zero devices or sentinel values (e.g. gfxffffffffffffffff) when
amdsmi_init() was called after torch.

Replace all rsmi_* calls (rsmi_init, rsmi_is_P2P_accessible) with
their AMDSMI equivalents (amdsmi_init, amdsmi_is_P2P_accessible),
resolved at runtime via dlsym.  This:
- removes all rsmi_*/amdsmi_* undefined symbols from libtorch_hip.so
- avoids any link-time NEEDED dependency on libamd_smi.so
- allows libamd_smi.so to drop the rsmi_* exports entirely
- removes the #include <rocm_smi/rocm_smi.h> dependency
- gracefully degrades if libamd_smi.so is not loaded
@adam360x adam360x force-pushed the users/adam360x/fix-rsmi-init-interposition branch from 7094c1a to edd012e Compare June 15, 2026 18:43
@rocm-repo-management-api

rocm-repo-management-api Bot commented Jun 15, 2026

Copy link
Copy Markdown

Jenkins build for edd012e63bcedf4261f65d44de95d87b8659a596 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant