fix: Fix fp8 memory fragmentation#2670
Conversation
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> WIP oom Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> for test Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> memory
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
…refit Signed-off-by: Anna Shors <ashors@nvidia.com>
| if hasattr(te, "module") and hasattr(te.module.base, "clear_workspace"): | ||
| te.module.base.clear_workspace() | ||
| except ImportError: | ||
| pass |
There was a problem hiding this comment.
Hi, @ashors1 might be my issue. I can't find this in TE and I don't think this piece code is helpful to the memory saving, maybe we can simply remove it?
| gc.collect() | ||
| torch.cuda.empty_cache() |
There was a problem hiding this comment.
Would it be more efficient to run gc.collect() and torch.cuda.empty_cache() just once after all references are cleared? Calling them frequently might introduce unnecessary overhead to the training pipeline.
| gc.collect() | ||
| torch.cuda.empty_cache() |
There was a problem hiding this comment.
Same above:
Would it be more efficient to run gc.collect() and torch.cuda.empty_cache() just once after all references are cleared? Calling them frequently might introduce unnecessary overhead to the training pipeline.
ZhiyuLi-Nvidia
left a comment
There was a problem hiding this comment.
Thank you @ashors1 for driving the fix!
Just left some nit.
LGTM!
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
closes #2003
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information