Predictable raft::resources by achirkin · Pull Request #3005 · rapidsai/raft

achirkin · 2026-04-29T11:01:36Z

Make the raft::resources resource initialization semantics more predictable.

All resources still are still initialized lazily on-demand (no change), but behave as-if they were initialized during the raft::resources handle construction: all copies of the raft::resources handle point to the same resources (not a breaking change, fixes the re-initialization issue).
set_resource changes to non-const semantics (breaking change).
Before: all set_xxx resource-updating calls were operating on const handle
Now: all set_xxx resources require a non-const handle.

Thread-safety

First and foremost, thread-safety of using a specific resource depends on that resource. Here we discuss the thread-safety of using raft::resources handle itself (get_resource and set_resource functions).

Accessing the same resource by const ref

Updates (set_resource) are not possible (a user should copy a handle and modify the new one). All concurrent get_resource calls are atomic and safe, even if they initialize the factories and resources under the hood. The worst can happen is the same resource being initialized concurrently in two threads but only one being stored in the handle (the other one is discarded).

Accessing the same resource by non-const ref

Using the same object by non-const ref from multiple threads is always unsafe.

Accessing copies of the same resource

The resources and factories are updated atomically. Modifying any resource doesn't propagate to the copies. Accessing a resource while another threads access or modifies the same resource via another handle is thus safe.

Implementation details

The PR adds one more layer of indirection (one extra shared_ptr) to each resource, which may cause an extra runtime overhead. This is unavoidable if we want to allow lazy-initialized resources back-propagate across handles.
On the other hand, the PR removes the handle mutex in favor of C++20 std::atomic<shared_ptr>, which may reduce the runtime overheads a little bit.

Breaking changes

All resource setters change the function signature - this is a big breaking change. However, all known use-sites already call the resources setters on non-const raft::resources handles.
std::atomic<shared_ptr> requires enforcing C++20 for all dependent libraries. An alternative would be to put a mutex per raft::resource_cell.

copy-pr-bot · 2026-04-29T11:01:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

achirkin · 2026-04-29T11:09:05Z

/ok to test

achirkin · 2026-04-29T12:12:30Z

/ok to test

jamxia155 · 2026-04-29T12:36:57Z

+  // Move construct
+  raft::resources b(std::move(a));
+  ASSERT_TRUE(b.has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW));
+


Would it make sense to have another block like

// check moved-from object ASSERT_FALSE(a.has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW));

and similarly, after b has been moved into c?

My concern is that if raft::resources lacked move constructor and assignment altogether, this test would still pass as the construction and assignment can fall back to copying.

Hmm, moved-from should have valid but unspecified state, so technically we cannot rely on ASSERT_FALSE. Since we know our implementation is std::vector<std::shared_ptr>, it is indeed false (the pointers become empty after move), but... does it really matter for us?

I just thought it would ensure that a move, not a copy, has in fact taken place.

jamxia155 · 2026-04-29T13:24:35Z

+   * Creates a new resource_cell with the given factory.  Other copies of this
+   * handle continue to point at the old cell, so the change does not propagate.


Would it make sense to enforce that add_resource_factory is only used with copies of a handle? Perhaps we could add a flag is_copy in resources that is default-initialized to false but gets set to true by the copy constructor, then add a RAFT_EXPECTS(is_copy) in add_resource_factory. This way, there's no risk of accidentally overwriting the resources in the original handle:

ensure_resource_factory for when you want to update the original handle (works on either original or copy).

add_resource_factory for when you want to make an isolated change to a copy of a handle (does not work on original).

I think we shouldn't distinguish between the "original" and "copied" handles - it doesn't really matter who's the first. Also it's not possible to overwrite a resource from one copy to another: add_resource_factory simply replaces the outer shared_ptr to a one that points to a new resource_cell (NB: copy constructor of the raft::resources copies the contents of the cells_ vector same as before; the outer shared_ptrs are not shared among resources handles).

Okay it's probably an overreach to make that distinction. Thanks for elaborating.

achirkin · 2026-04-29T16:53:56Z

Because of requiring C++20, this PR is blocked on completing the migration of cuVS to C++20 (see PRs #1795, #1796, #1799).

tarang-jain · 2026-05-01T00:51:00Z

Note that rapidsai/cuvs#1796 is blocked. https://nvbugspro.nvidia.com/bug/5356084 is the existing issue for the 12.9 compiler issue. Nothing we can do as the fix was merged into 13.1.

achirkin · 2026-05-06T14:20:48Z

/ok to test

Lazy-initialization propagates through copies

6be7b6c

achirkin self-assigned this Apr 29, 2026

achirkin added improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 29, 2026

github-project-automation Bot added this to Unstructured Data Processing Apr 29, 2026

Require C++20

21b7733

achirkin moved this to In Progress in Unstructured Data Processing Apr 29, 2026

jamxia155 reviewed Apr 29, 2026

View reviewed changes

achirkin mentioned this pull request Apr 29, 2026

Test against RAFT #3005 rapidsai/cuvs#2043

Open

achirkin moved this from In Progress to Blocked in Unstructured Data Processing Apr 29, 2026

Merge branch 'main' into enh-predictable-resources

7a4d525

This was referenced May 6, 2026

Test against raft #3005 rapidsai/cuml#8053

Draft

Test against raft #3005 rapidsai/cugraph#5507

Draft

Merge branch 'main' into enh-predictable-resources

3afd7be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictable raft::resources#3005

Predictable raft::resources#3005
achirkin wants to merge 4 commits intorapidsai:mainfrom
achirkin:enh-predictable-resources

achirkin commented Apr 29, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

jamxia155 Apr 29, 2026

Uh oh!

achirkin Apr 29, 2026

Uh oh!

jamxia155 Apr 29, 2026

Uh oh!

jamxia155 Apr 29, 2026

Uh oh!

achirkin Apr 29, 2026 •

edited

Loading

Uh oh!

jamxia155 Apr 29, 2026

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

tarang-jain commented May 1, 2026

Uh oh!

achirkin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		* Creates a new resource_cell with the given factory. Other copies of this
		* handle continue to point at the old cell, so the change does not propagate.

Conversation

achirkin commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thread-safety

Accessing the same resource by const ref

Accessing the same resource by non-const ref

Accessing copies of the same resource

Implementation details

Breaking changes

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

jamxia155 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jamxia155 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jamxia155 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamxia155 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin commented Apr 29, 2026

Uh oh!

tarang-jain commented May 1, 2026

Uh oh!

achirkin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

achirkin commented Apr 29, 2026 •

edited

Loading

achirkin Apr 29, 2026 •

edited

Loading