Enhanced support for memory-tracking raft resources#3004
Enhanced support for memory-tracking raft resources#3004achirkin wants to merge 5 commits intorapidsai:mainfrom
Conversation
| // NVCC injects __host__ __device__ on std::shared_ptr special members, | ||
| // which makes the *implicit* or *defaulted* special members __host__ | ||
| // __device__ too. That conflicts with Upstream types whose special | ||
| // members are __host__ only (e.g. rmm::device_async_resource_ref). | ||
| // User-defined bodies (not = default) force plain __host__ execution space. |
There was a problem hiding this comment.
Good find. Are there changes you would suggest for RMM or CCCL?
There was a problem hiding this comment.
Thanks! I think, CCCL/rrm are fine, because the cuda::mr::shared_resource defines the copy/move constructors explicitly.
| @@ -0,0 +1,237 @@ | |||
| /* | |||
There was a problem hiding this comment.
Why wouldn't we put this in raft/core/resources instead of raft/util? It would be nice to put all of these in the same place, given RAFT Is more than just memory resources. That wy users can find them easily. I think we should do the same with the above instead of putting them in mr.
There was a problem hiding this comment.
I follow the example of the memory_tracking_resources here. But that was introduced recentely too (26.04), so we can change both not risking breaking things too much.
Just to clarify the logic of my choice for these two: I've put them into util folder, because they are not a part of a "normal" algorithm flow but rather utilities to analyize/profile the memory-related resource usage. It's not a strong prerefence though. Would like me to move them both to the core folder (and mark the PR as breaking)?
NB the current state of raft (main + PR):
raft/core/resources/- individual resources, such as cuda stream, cublas, memory resourcesraft/core/- raft::resources itself (+memory_tracking_resources, memory_stats_resources, dry_run_resources if we decide so)raft/mr/- CCCL/rmm compatible memory resourcesraft/util/- current of memory_tracking_resources, memory_stats_resources, dry_run_resources
statistics_adaptor.hppto compile via nvcc (see code comment)memory_stats_resources. In constrast tomemory_tracking_resourcesthis doesn't stream the memory usage online, but only reports the overall usage metrics.