Skip to content

Enhanced support for memory-tracking raft resources#3004

Open
achirkin wants to merge 5 commits intorapidsai:mainfrom
achirkin:enh-memory-resources
Open

Enhanced support for memory-tracking raft resources#3004
achirkin wants to merge 5 commits intorapidsai:mainfrom
achirkin:enh-memory-resources

Conversation

@achirkin
Copy link
Copy Markdown
Contributor

  1. Change the host memory resource to have the same owning semantics as the device memory resources as of Migrate RMM usage to CCCL MR design #2996
  2. Add a workaround to statistics_adaptor.hpp to compile via nvcc (see code comment)
  3. Add memory_stats_resources. In constrast to memory_tracking_resources this doesn't stream the memory usage online, but only reports the overall usage metrics.

@achirkin achirkin self-assigned this Apr 24, 2026
@achirkin achirkin requested review from a team as code owners April 24, 2026 14:09
@achirkin achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 24, 2026
@achirkin achirkin moved this to In Progress in Unstructured Data Processing Apr 24, 2026
@achirkin achirkin changed the title Enhanced support for memory tracking raft resources Enhanced support for memory-tracking raft resources Apr 24, 2026
Comment on lines +75 to +79
// NVCC injects __host__ __device__ on std::shared_ptr special members,
// which makes the *implicit* or *defaulted* special members __host__
// __device__ too. That conflicts with Upstream types whose special
// members are __host__ only (e.g. rmm::device_async_resource_ref).
// User-defined bodies (not = default) force plain __host__ execution space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Are there changes you would suggest for RMM or CCCL?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think, CCCL/rrm are fine, because the cuda::mr::shared_resource defines the copy/move constructors explicitly.

@@ -0,0 +1,237 @@
/*
Copy link
Copy Markdown
Member

@cjnolet cjnolet May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we put this in raft/core/resources instead of raft/util? It would be nice to put all of these in the same place, given RAFT Is more than just memory resources. That wy users can find them easily. I think we should do the same with the above instead of putting them in mr.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the example of the memory_tracking_resources here. But that was introduced recentely too (26.04), so we can change both not risking breaking things too much.
Just to clarify the logic of my choice for these two: I've put them into util folder, because they are not a part of a "normal" algorithm flow but rather utilities to analyize/profile the memory-related resource usage. It's not a strong prerefence though. Would like me to move them both to the core folder (and mark the PR as breaking)?

NB the current state of raft (main + PR):

  • raft/core/resources/ - individual resources, such as cuda stream, cublas, memory resources
  • raft/core/ - raft::resources itself (+memory_tracking_resources, memory_stats_resources, dry_run_resources if we decide so)
  • raft/mr/ - CCCL/rmm compatible memory resources
  • raft/util/ - current of memory_tracking_resources, memory_stats_resources, dry_run_resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants