[release/2.12] Advance Triton pin to 3.7.1 with module fix#3360
Open
naromero77amd wants to merge 2 commits into
Open
[release/2.12] Advance Triton pin to 3.7.1 with module fix#3360naromero77amd wants to merge 2 commits into
naromero77amd wants to merge 2 commits into
ROCm Repo Management API / Jenkins
failed
Jun 24, 2026 in 11h 45m 58s
Tests/Test Distributed/Run pytorch_distributed_2: warning in 'junit' step
Tests / Test Distributed / Test Distributed / Run pytorch_distributed_2 / Shell Script
Error in sh step, with arguments ./test_pytorch_test_distributed.sh.
script returned exit code 1
Build log
Build log truncated.
[2026-06-24T17:09:17.568Z] W0624 17:06:23.321000 2069567 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] W0624 17:06:23.379000 2069566 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:882: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
[2026-06-24T17:09:17.568Z] return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[2026-06-24T17:09:17.568Z] /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:882: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
[2026-06-24T17:09:17.568Z] return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[2026-06-24T17:09:17.568Z] /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:882: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
[2026-06-24T17:09:17.568Z] return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[2026-06-24T17:09:17.568Z] /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:882: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
[2026-06-24T17:09:17.568Z] return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[2026-06-24T17:09:17.568Z] PASSED [15.5011s] [100%]
[2026-06-24T17:09:17.568Z]
[2026-06-24T17:09:17.568Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bccd26258643046d.xml -
[2026-06-24T17:09:17.568Z] ============================== 1 passed in 15.54s ==============================
[2026-06-24T17:09:17.568Z] W0624 17:06:39.359000 2070143 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ea5f2efee8cbefe.xml
[2026-06-24T17:09:17.568Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.568Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.568Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.568Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.568Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.568Z] configfile: pytest.ini
[2026-06-24T17:09:17.568Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.568Z] collecting ... collected 1 item
[2026-06-24T17:09:17.568Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.568Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
[2026-06-24T17:09:17.568Z]
[2026-06-24T17:09:17.568Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0006s] (Nccl does not support CPU tensors) [100%]
[2026-06-24T17:09:17.568Z]
[2026-06-24T17:09:17.568Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ea5f2efee8cbefe.xml -
[2026-06-24T17:09:17.568Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.568Z] W0624 17:06:44.207000 2070225 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc85e3c298495dc0.xml
[2026-06-24T17:09:17.568Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.568Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.568Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.568Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.568Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.568Z] configfile: pytest.ini
[2026-06-24T17:09:17.568Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.568Z] collecting ... collected 1 item
[2026-06-24T17:09:17.568Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.568Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object_subgroup
[2026-06-24T17:09:17.568Z]
[2026-06-24T17:09:17.568Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object_subgroup <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:06:45.718000 2070225 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2070309
[2026-06-24T17:09:17.568Z] I0624 17:06:45.719000 2070225 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2070310
[2026-06-24T17:09:17.568Z] I0624 17:06:45.720000 2070225 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2070311
[2026-06-24T17:09:17.568Z] I0624 17:06:45.721000 2070225 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2070312
[2026-06-24T17:09:17.568Z] W0624 17:06:48.297000 2070311 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] W0624 17:06:48.339000 2070309 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] W0624 17:06:48.466000 2070312 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] W0624 17:06:48.483000 2070310 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.568Z] [rank3]:W0624 17:06:49.931000 2070312 site-packages/torch/distributed/distributed_c10d.py:3344] _object_to_tensor size: 18 hash value: 6002723302436841527
[2026-06-24T17:09:17.568Z] [rank2]:W0624 17:06:49.934000 2070311 site-packages/torch/distributed/distributed_c10d.py:3344] _object_to_tensor size: 511 hash value: 9741690078026207469
[2026-06-24T17:09:17.569Z] [rank1]:W0624 17:06:49.934000 2070310 site-packages/torch/distributed/distributed_c10d.py:3344] _object_to_tensor size: 97 hash value: 2395169942717765378
[2026-06-24T17:09:17.569Z] [rank0]:W0624 17:06:49.934000 2070309 site-packages/torch/distributed/distributed_c10d.py:3344] _object_to_tensor size: 54 hash value: 15559783960315060411
[2026-06-24T17:09:17.569Z] [rank0]:W0624 17:06:57.918000 2070309 site-packages/torch/distributed/distributed_c10d.py:3359] _tensor_to_object size: 511 hash value: 8297009599345899279
[2026-06-24T17:09:17.569Z] [rank0]:W0624 17:06:57.918000 2070309 site-packages/torch/distributed/distributed_c10d.py:3359] _tensor_to_object size: 511 hash value: 8297009599345899279
[2026-06-24T17:09:17.569Z] [rank0]:W0624 17:06:57.918000 2070309 site-packages/torch/distributed/distributed_c10d.py:3359] _tensor_to_object size: 511 hash value: 8297009599345899279
[2026-06-24T17:09:17.569Z] [rank0]:W0624 17:06:57.920000 2070309 site-packages/torch/distributed/distributed_c10d.py:3359] _tensor_to_object size: 511 hash value: 8297009599345899279
[2026-06-24T17:09:17.569Z] /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
[2026-06-24T17:09:17.569Z] return func(*args, **kwargs)
[2026-06-24T17:09:17.569Z] [rank0]:[W624 17:06:57.750777831 ProcessGroupNCCL.cpp:5324] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
[2026-06-24T17:09:17.569Z] PASSED [18.2122s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc85e3c298495dc0.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 18.25s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:07:07.407000 2070825 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65839141e98b36dd.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0005s] (Nccl does not support irecv) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65839141e98b36dd.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:07:12.231000 2070891 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae080acc463b7d2f.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_allreduce_hang <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:07:13.756000 2070891 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2071143
[2026-06-24T17:09:17.569Z] I0624 17:07:13.757000 2070891 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2071144
[2026-06-24T17:09:17.569Z] I0624 17:07:13.758000 2070891 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2071145
[2026-06-24T17:09:17.569Z] I0624 17:07:13.759000 2070891 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2071146
[2026-06-24T17:09:17.569Z] W0624 17:07:16.350000 2071143 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:16.464000 2071145 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:16.501000 2071144 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:16.525000 2071146 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] [rank2]:[E624 17:07:26.952835107 ProcessGroupNCCL.cpp:757] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) ran for 100 milliseconds before timing out.
[2026-06-24T17:09:17.569Z] [rank3]:[E624 17:07:26.953005463 ProcessGroupNCCL.cpp:757] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) ran for 100 milliseconds before timing out.
[2026-06-24T17:09:17.569Z] [rank1]:[E624 17:07:26.953029556 ProcessGroupNCCL.cpp:757] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) ran for 100 milliseconds before timing out.
[2026-06-24T17:09:17.569Z] [rank2]:[E624 17:07:26.953294531 ProcessGroupNCCL.cpp:889] [Rank 2] Work WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) timed out in blocking wait.
[2026-06-24T17:09:17.569Z] [rank3]:[E624 17:07:26.953394831 ProcessGroupNCCL.cpp:889] [Rank 3] Work WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) timed out in blocking wait.
[2026-06-24T17:09:17.569Z] [rank1]:[E624 17:07:26.953404809 ProcessGroupNCCL.cpp:889] [Rank 1] Work WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=15000) timed out in blocking wait.
[2026-06-24T17:09:17.569Z] [rank0]:[E624 17:07:26.975042623 ProcessGroupGloo.cpp:67] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 100 ms
[2026-06-24T17:09:17.569Z] [rank3]:[E624 17:07:26.463219915 ProcessGroupNCCL.cpp:818] [Rank 3] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[2026-06-24T17:09:17.569Z] [rank3]:[E624 17:07:26.463259878 ProcessGroupNCCL.cpp:832] [Rank 3] To avoid data inconsistency, we are taking the entire process down.
[2026-06-24T17:09:17.569Z] [rank1]:[E624 17:07:26.463550537 ProcessGroupNCCL.cpp:818] [Rank 1] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[2026-06-24T17:09:17.569Z] [rank1]:[E624 17:07:26.463566138 ProcessGroupNCCL.cpp:832] [Rank 1] To avoid data inconsistency, we are taking the entire process down.
[2026-06-24T17:09:17.569Z] [rank2]:[E624 17:07:26.463616192 ProcessGroupNCCL.cpp:818] [Rank 2] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[2026-06-24T17:09:17.569Z] [rank2]:[E624 17:07:26.463664555 ProcessGroupNCCL.cpp:832] [Rank 2] To avoid data inconsistency, we are taking the entire process down.
[2026-06-24T17:09:17.569Z] PASSED [14.8070s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae080acc463b7d2f.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 14.85s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:07:32.055000 2071778 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f39957c6296b2edc.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0005s] (Test requires backend nccl to be one of {'gloo'}) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f39957c6296b2edc.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:07:36.848000 2071863 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23d9a0c987fc3b62.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:07:38.334000 2071863 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2071946
[2026-06-24T17:09:17.569Z] I0624 17:07:38.336000 2071863 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2071947
[2026-06-24T17:09:17.569Z] I0624 17:07:38.337000 2071863 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2071948
[2026-06-24T17:09:17.569Z] I0624 17:07:38.338000 2071863 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2071949
[2026-06-24T17:09:17.569Z] W0624 17:07:41.060000 2071947 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:41.069000 2071948 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:41.075000 2071949 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:07:41.083000 2071946 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] PASSED [14.4028s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23d9a0c987fc3b62.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 14.44s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:07:56.230000 2072317 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0a927b55bf598c6d.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:07:57.717000 2072317 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2072382
[2026-06-24T17:09:17.569Z] I0624 17:07:57.718000 2072317 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2072383
[2026-06-24T17:09:17.569Z] I0624 17:07:57.719000 2072317 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2072384
[2026-06-24T17:09:17.569Z] I0624 17:07:57.720000 2072317 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2072385
[2026-06-24T17:09:17.569Z] W0624 17:08:00.410000 2072385 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:00.413000 2072382 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:00.420000 2072384 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:00.434000 2072383 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] PASSED [16.9091s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0a927b55bf598c6d.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 16.95s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:08:18.231000 2073050 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87aa4c5d2e793551.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_min <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0006s] (Nccl does not support CPU tensors) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87aa4c5d2e793551.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:08:23.023000 2073160 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a40e8159cbec2d27.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0005s] (Nccl does not support CPU tensors) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a40e8159cbec2d27.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:08:27.828000 2073428 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-df997b6851182d18.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0005s] (Nccl does not support CPU tensors) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-df997b6851182d18.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:08:32.631000 2073513 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2a56f8ac2285aab.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:08:34.114000 2073513 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2073599
[2026-06-24T17:09:17.569Z] I0624 17:08:34.115000 2073513 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2073600
[2026-06-24T17:09:17.569Z] I0624 17:08:34.116000 2073513 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2073601
[2026-06-24T17:09:17.569Z] I0624 17:08:34.117000 2073513 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2073602
[2026-06-24T17:09:17.569Z] W0624 17:08:36.704000 2073599 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:36.758000 2073601 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:36.824000 2073600 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:36.885000 2073602 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:38 2073602:2073602 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:38 2073600:2073600 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:38 2073599:2073599 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:38 2073601:2073601 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:455] profiler_stopUSDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:415] profiler_startUSDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073602:2073602 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073600:2073600 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073601:2073601 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:46 2073599:2073599 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] PASSED [14.5025s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2a56f8ac2285aab.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 14.55s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:08:52.077000 2073991 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23e28fd33188f6d4.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py I0624 17:08:53.558000 2073991 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 0 with pid 2074058
[2026-06-24T17:09:17.569Z] I0624 17:08:53.559000 2073991 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 1 with pid 2074059
[2026-06-24T17:09:17.569Z] I0624 17:08:53.560000 2073991 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 2 with pid 2074060
[2026-06-24T17:09:17.569Z] I0624 17:08:53.561000 2073991 site-packages/torch/testing/_internal/common_distributed.py:895] Started process 3 with pid 2074061
[2026-06-24T17:09:17.569Z] W0624 17:08:56.143000 2074058 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:56.205000 2074060 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:56.216000 2074061 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] W0624 17:08:56.234000 2074059 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:57 2074061:2074061 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:57 2074060:2074060 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:57 2074058:2074058 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:08:57 2074059:2074059 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:415] profiler_startUSDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:455] profiler_stopUSDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:415] profiler_start
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074059:2074059 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074058:2074058 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074060:2074060 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] USDT:2026-06-24 17:09:05 2074061:2074061 ActivityProfilerController.cpp:455] profiler_stop
[2026-06-24T17:09:17.569Z] PASSED [14.5030s] [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23e28fd33188f6d4.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 passed in 14.55s ==============================
[2026-06-24T17:09:17.569Z] W0624 17:09:11.519000 2074446 site-packages/torch/_native/cutedsl_utils.py:55] CuTeDSL operators require optional Python packages `nvidia-cutlass-dsl` and `apache-tvm-ffi`; missing optional dependency `nvidia_cutlass_dsl` (importlib.util.find_spec(nvidia_cutlass_dsl) failed)
[2026-06-24T17:09:17.569Z] Test results will be stored in test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-123145b977878efb.xml
[2026-06-24T17:09:17.569Z] ============================= test session starts ==============================
[2026-06-24T17:09:17.569Z] platform linux -- Python 3.12.13, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python
[2026-06-24T17:09:17.569Z] cachedir: .pytest_cache
[2026-06-24T17:09:17.569Z] hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
[2026-06-24T17:09:17.569Z] rootdir: /var/lib/jenkins/pytorch
[2026-06-24T17:09:17.569Z] configfile: pytest.ini
[2026-06-24T17:09:17.569Z] plugins: flakefinder-1.1.0, cpp-2.3.0, subtests-0.13.1, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.3.0, hypothesis-6.56.4, typeguard-4.3.0
[2026-06-24T17:09:17.569Z] collecting ... collected 1 item
[2026-06-24T17:09:17.569Z] stepcurrent: previously run test not found, not skipping.
[2026-06-24T17:09:17.569Z] Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/distributed/distributed_test.py SKIPPED [0.0006s] (nccl does not support send/recv from any source) [100%]
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-123145b977878efb.xml -
[2026-06-24T17:09:17.569Z] ============================== 1 skipped in 0.02s ==============================
[2026-06-24T17:09:17.569Z] Traceback (most recent call last):
[2026-06-24T17:09:17.569Z] File "/var/lib/jenkins/pytorch/test/distributed/test_distributed_spawn.py", line 60, in <module>
[2026-06-24T17:09:17.569Z] run_tests()
[2026-06-24T17:09:17.569Z] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 1327, in run_tests
[2026-06-24T17:09:17.569Z] raise AssertionError(
[2026-06-24T17:09:17.569Z] AssertionError: 1 unit test(s) failed:
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] FINISHED PRINTING LOG FILE of distributed/test_distributed_spawn 8/10 (test/test-reports/distributed.test_distributed_spawn_8.10_bf1774ab1c188026_.log)
[2026-06-24T17:09:17.569Z]
[2026-06-24T17:09:17.569Z] Finished distributed/test_distributed_spawn 8/10 ... [2026-06-24 17:09:15.260776][2278980.089579359], took 9.25min
[2026-06-24T17:09:17.569Z] distributed/test_distributed_spawn 8/10 failed!
[2026-06-24T17:09:17.569Z] Emitting td_test_failure_stats_v2
[2026-06-24T17:09:17.569Z] /var/lib/jenkins/pytorch/tools/stats/upload_metrics.py:140: UserWarning: Not emitting metrics for td_test_failure_stats_v2. Missing repo. Please set the GITHUB_REPOSITORY environment variable to pass in this value.
[2026-06-24T17:09:17.570Z] warn(f"Not emitting metrics for {metric_name}. {e}")
[2026-06-24T17:09:17.570Z] Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ab4fd9901ab0a0e7.xml
[2026-06-24T17:09:17.570Z] Found job id: None
[2026-06-24T17:09:17.570Z] Failed to parse and upload json test reports: Unable to locate credentials
[2026-06-24T17:09:17.570Z] GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
[2026-06-24T17:09:17.570Z] Uploading artifacts took 0.00 seconds
[2026-06-24T17:09:17.570Z] Traceback (most recent call last):
[2026-06-24T17:09:17.570Z] File "/var/lib/jenkins/pytorch/test/run_test.py", line 2319, in <module>
[2026-06-24T17:09:17.570Z] main()
[2026-06-24T17:09:17.570Z] File "/var/lib/jenkins/pytorch/test/run_test.py", line 2270, in main
[2026-06-24T17:09:17.570Z] run_tests(
[2026-06-24T17:09:17.570Z] File "/var/lib/jenkins/pytorch/test/run_test.py", line 2091, in run_tests
[2026-06-24T17:09:17.570Z] raise RuntimeError(failure.message + keep_going_message)
[2026-06-24T17:09:17.570Z] RuntimeError: distributed/test_distributed_spawn 8/10 failed!
[2026-06-24T17:09:17.570Z]
[2026-06-24T17:09:17.570Z] Tip: You can keep running tests even on failure by passing --keep-going to run_test.py.
[2026-06-24T17:09:17.570Z] If running on CI, add the 'keep-going' label to your PR and rerun your jobs.
[2026-06-24T17:09:17.570Z]
[2026-06-24T17:09:17.570Z] real 116m57.364s
[2026-06-24T17:09:17.570Z] user 448m55.477s
[2026-06-24T17:09:17.570Z] sys 115m24.357s
[2026-06-24T17:09:17.570Z] + sccache_epilogue
[2026-06-24T17:09:17.570Z] + echo '::group::Sccache Compilation Log'
[2026-06-24T17:09:17.570Z] ::group::Sccache Compilation Log
[2026-06-24T17:09:17.570Z] =================== sccache compilation log ===================
[2026-06-24T17:09:17.570Z] + echo '=================== sccache compilation log ==================='
[2026-06-24T17:09:17.570Z] + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /root/sccache_error.log
[2026-06-24T17:09:17.570Z] + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
[2026-06-24T17:09:17.570Z] + sccache --show-stats
[2026-06-24T17:09:17.570Z] =========== If your build fails, please take a look at the log above for possible reasons ===========
[2026-06-24T17:09:17.570Z] Compile requests 1
[2026-06-24T17:09:17.570Z] Compile requests executed 0
[2026-06-24T17:09:17.570Z] Cache hits 0
[2026-06-24T17:09:17.570Z] Cache misses 0
[2026-06-24T17:09:17.570Z] Cache hits rate -
[2026-06-24T17:09:17.570Z] Cache timeouts 0
[2026-06-24T17:09:17.570Z] Cache read errors 0
[2026-06-24T17:09:17.570Z] Forced recaches 0
[2026-06-24T17:09:17.570Z] Cache write errors 0
[2026-06-24T17:09:17.570Z] Cache errors 0
[2026-06-24T17:09:17.570Z] Compilations 0
[2026-06-24T17:09:17.570Z] Compilation failures 0
[2026-06-24T17:09:17.570Z] Non-cacheable compilations 0
[2026-06-24T17:09:17.570Z] Non-cacheable calls 0
[2026-06-24T17:09:17.570Z] Non-compilation calls 1
[2026-06-24T17:09:17.570Z] Unsupported compiler calls 0
[2026-06-24T17:09:17.570Z] Average cache write 0.000 s
[2026-06-24T17:09:17.570Z] Average compiler 0.000 s
[2026-06-24T17:09:17.570Z] Average cache read hit 0.000 s
[2026-06-24T17:09:17.570Z] Failed distributed compilations 0
[2026-06-24T17:09:17.570Z] Cache location Local disk: "/root/.cache/sccache"
[2026-06-24T17:09:17.570Z] Use direct/preprocessor mode? yes
[2026-06-24T17:09:17.570Z] Version (client) 0.13.0
[2026-06-24T17:09:17.570Z] Max cache size 10 GiB
[2026-06-24T17:09:17.570Z] + sccache --stop-server
[2026-06-24T17:09:17.570Z] Stopping sccache server...
[2026-06-24T17:09:17.570Z] Compile requests 1
[2026-06-24T17:09:17.570Z] Compile requests executed 0
[2026-06-24T17:09:17.570Z] Cache hits 0
[2026-06-24T17:09:17.570Z] Cache misses 0
[2026-06-24T17:09:17.570Z] Cache hits rate -
[2026-06-24T17:09:17.570Z] Cache timeouts 0
[2026-06-24T17:09:17.570Z] Cache read errors 0
[2026-06-24T17:09:17.570Z] Forced recaches 0
[2026-06-24T17:09:17.570Z] Cache write errors 0
[2026-06-24T17:09:17.570Z] Cache errors 0
[2026-06-24T17:09:17.570Z] Compilations 0
[2026-06-24T17:09:17.570Z] Compilation failures 0
[2026-06-24T17:09:17.570Z] Non-cacheable compilations 0
[2026-06-24T17:09:17.570Z] Non-cacheable calls 0
[2026-06-24T17:09:17.570Z] Non-compilation calls 1
[2026-06-24T17:09:17.570Z] Unsupported compiler calls 0
[2026-06-24T17:09:17.570Z] Average cache write 0.000 s
[2026-06-24T17:09:17.570Z] Average compiler 0.000 s
[2026-06-24T17:09:17.570Z] Average cache read hit 0.000 s
[2026-06-24T17:09:17.570Z] Failed distributed compilations 0
[2026-06-24T17:09:17.570Z] Cache location Local disk: "/root/.cache/sccache"
[2026-06-24T17:09:17.570Z] Use direct/preprocessor mode? yes
[2026-06-24T17:09:17.570Z] Version (client) 0.13.0
[2026-06-24T17:09:17.570Z] Max cache size 10 GiB
[2026-06-24T17:09:17.570Z] + echo ::endgroup::
[2026-06-24T17:09:17.570Z] ::endgroup::
[2026-06-24T17:09:17.570Z] + cp -RT test/test-reports /host_workspace/pytorch_reports
[2026-06-24T17:09:17.570Z] + chmod -R 777 /host_workspace/pytorch_log /host_workspace/pytorch_reports
[2026-06-24T17:09:17.570Z] + git clean -fdx
[2026-06-24T17:09:17.570Z] Removing .additional_ci_files/
[2026-06-24T17:09:17.570Z] Removing .pytest_cache/
[2026-06-24T17:09:17.570Z] Removing build/
[2026-06-24T17:09:17.570Z] Removing dist/
[2026-06-24T17:09:17.570Z] Removing test/.pytorch-disabled-tests.json
[2026-06-24T17:09:17.570Z] Removing test/__pycache__/
[2026-06-24T17:09:17.570Z] Removing test/distributed/__pycache__/
[2026-06-24T17:09:17.570Z] Removing test/test-reports/
[2026-06-24T17:09:17.570Z] Removing test_artifacts.zip
[2026-06-24T17:09:17.570Z] Removing tools/__pycache__/
[2026-06-24T17:09:17.570Z] Removing tools/stats/__pycache__/
[2026-06-24T17:09:17.570Z] Removing tools/testing/__pycache__/
[2026-06-24T17:09:17.570Z] Removing tools/testing/target_determination/__pycache__/
[2026-06-24T17:09:17.570Z] Removing tools/testing/target_determination/heuristics/__pycache__/
[2026-06-24T17:09:17.570Z] Removing torch-2.12.0+git20d2603-cp312-cp312-linux_x86_64.whl
Tests / Test Distributed / Test Distributed / Run pytorch_distributed_2 / Error signal
Error in error step, with arguments pytorch_distributed_2 failed.
pytorch_distributed_2 failed
Output truncated.
Details
- Kill older PR Builds (1.7 sec)
- Initialize (2 hr 22 min)
- Download CI scripts (37 sec)
- Checkout Pytorch (47 sec)
- Check base Docker image existence (12 sec)
- Build Docker image (1 hr 17 min)
- Build PyTorch (1 hr 0 min)
- Tests (9 hr 23 min)
- Test PyTorch (8 ms)
- Test PyTorch (2 hr 27 min)
- Run pytorch_test_1 (1 hr 12 min)
- Run pytorch_test_2 (1 hr 14 min)
- Test PyTorch (2 hr 27 min)
- Test Distributed (9 ms)
- Test Inductor (22 ms)
- Test Inductor (2 hr 36 min)
- Run pytorch_inductor_1 (2 hr 36 min)
- Test Inductor (2 hr 36 min)
- Test PyTorch Slow (8 ms)
- Test PyTorch Slow (7.9 sec)
- Microbenchmark (15 sec)
- Microbenchmark (7.9 sec)
- Test PyTorch (8 ms)
- Post Build (3.9 sec)
- Declarative: Post Actions (6.2 sec)
Loading