[release/2.12] fix leak in CUDAGraph::capture_end (#180395)#3357
Merged
pragupta merged 1 commit intoJun 23, 2026
Merged
Conversation
`CUDAGraph::capture_end()` previously performed a CUDA error check (`AT_CUDA_CHECK(endCaptureErr)`) immediately after calling `cudaStreamEndCapture`. If this CUDA call returned an error, the check would throw an exception, bypassing the subsequent calls to: - `c10::cuda::CUDACachingAllocator::endAllocateToPool` - `at::getHostAllocator(at::kCUDA)->end_allocate_to_pool` This left the `CUDACachingAllocator` in a state where it believed a capture was still underway for that specific memory pool. Any subsequent attempt to synchronize (e.g., during garbage collection of a `MemPool` object) would then trigger the `captures_underway.empty()` assertion failure and crash the process. ``` libc++abi: terminating due to uncaught exception of type c10::Error: captures_underway.empty() INTERNAL ASSERT FAILED at "third_party/py/torch/c10/cuda/CUDACachingAllocator.cpp":3941, please report a bug to PyTorch. Exception raised from synchronize_and_free_events at [third_party/py/torch/c10/cuda/CUDACachingAllocator.cpp:3941](https://cs.corp.google.com/piper///depot/google3/third_party/py/torch/c10/cuda/CUDACachingAllocator.cpp?l=3941&ws=ddelgadovargas/171324&snapshot=5186) (most recent call first): C++ CapturedTraceback: ROCm#4 0x5583bafd4db5: c10::Error::Error(c10::SourceLocation, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>) from ??:0 ROCm#5 0x5583bafd2077: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 ROCm#6 0x5583a3d50749: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, c10::detail::CompileTimeEmptyString) from ??:0 ROCm#7 0x5583bac90543: c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::synchronize_and_free_events(std::__u::shared_ptr<c10::GatheredContext> const&, c10::cuda::CUDACachingAllocator::Native::(anonymous namespace)::PrivatePool*) from ??:0 ROCm#8 0x5583bac7f720: c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::release_cached_blocks(std::__u::shared_ptr<c10::GatheredContext> const&, std::__u::pair<unsigned long long, unsigned long long>) from ??:0 ROCm#9 0x5583bac93670: c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::emptyCache(std::__u::pair<unsigned long long, unsigned long long>) from ??:0 ROCm#10 0x5583bac6355f: c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::emptyCache(std::__u::pair<unsigned long long, unsigned long long>) from ??:0 ROCm#11 0x5583a9758035: at::cuda::MemPool::~MemPool() from ??:0 ``` This change reorders the logic in `CUDAGraph::capture_end()` to call the allocator's "end capture" notification before checking the error status of the CUDA call. This ensures that the allocator's internal state is always synchronized with the actual state of the CUDA stream, preventing "zombie" captures from leaking and causing crashes in unrelated code paths. Pull Request resolved: pytorch#180395 Approved by: https://github.com/eee4017, https://github.com/Skylion007 (cherry picked from commit 6fa359b)
|
Jenkins build for 9d4d043fd0a194b97dad3d733d55be11abae4751 commit finished as FAILURE |
Author
|
!cherry-pick --onto release/2.11 release/2.10 |
This was referenced Jun 24, 2026
|
Created branch autogenerated/release/2.11_cherry-pick_pr-3357 and #3367. It contains a merge conflict. Please resolve it Created branch autogenerated/release/2.10_cherry-pick_pr-3357 and #3368. It contains a merge conflict. Please resolve it Comment processed by Build |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CUDAGraph::capture_end()previously performed a CUDA error check (AT_CUDA_CHECK(endCaptureErr)) immediately after callingcudaStreamEndCapture. If this CUDA call returned an error, the check would throw an exception, bypassing the subsequent calls to:c10::cuda::CUDACachingAllocator::endAllocateToPoolat::getHostAllocator(at::kCUDA)->end_allocate_to_poolThis left the
CUDACachingAllocatorin a state where it believed a capture was still underway for that specific memory pool. Any subsequent attempt to synchronize (e.g., during garbage collection of aMemPoolobject) would then trigger thecaptures_underway.empty()assertion failure and crash the process.This change reorders the logic in
CUDAGraph::capture_end()to call the allocator's "end capture" notification before checking the error status of the CUDA call. This ensures that the allocator's internal state is always synchronized with the actual state of the CUDA stream, preventing "zombie" captures from leaking and causing crashes in unrelated code paths.Pull Request resolved: pytorch#180395
Approved by: https://github.com/eee4017, https://github.com/Skylion007
(cherry picked from commit 6fa359b)
Minimal reproducer is just those two tests in one process:
Cherry-picked to release/2.11 branch via #3367
Cherry-picked to release/2.10 branch via #3368