Predictable raft::resources#3005
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
| // Move construct | ||
| raft::resources b(std::move(a)); | ||
| ASSERT_TRUE(b.has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW)); | ||
|
|
There was a problem hiding this comment.
Would it make sense to have another block like
// check moved-from object
ASSERT_FALSE(a.has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW));
and similarly, after b has been moved into c?
My concern is that if raft::resources lacked move constructor and assignment altogether, this test would still pass as the construction and assignment can fall back to copying.
There was a problem hiding this comment.
Hmm, moved-from should have valid but unspecified state, so technically we cannot rely on ASSERT_FALSE. Since we know our implementation is std::vector<std::shared_ptr>, it is indeed false (the pointers become empty after move), but... does it really matter for us?
There was a problem hiding this comment.
I just thought it would ensure that a move, not a copy, has in fact taken place.
| * Creates a new resource_cell with the given factory. Other copies of this | ||
| * handle continue to point at the old cell, so the change does not propagate. |
There was a problem hiding this comment.
Would it make sense to enforce that add_resource_factory is only used with copies of a handle? Perhaps we could add a flag is_copy in resources that is default-initialized to false but gets set to true by the copy constructor, then add a RAFT_EXPECTS(is_copy) in add_resource_factory. This way, there's no risk of accidentally overwriting the resources in the original handle:
ensure_resource_factoryfor when you want to update the original handle (works on either original or copy).add_resource_factoryfor when you want to make an isolated change to a copy of a handle (does not work on original).
There was a problem hiding this comment.
I think we shouldn't distinguish between the "original" and "copied" handles - it doesn't really matter who's the first. Also it's not possible to overwrite a resource from one copy to another: add_resource_factory simply replaces the outer shared_ptr to a one that points to a new resource_cell (NB: copy constructor of the raft::resources copies the contents of the cells_ vector same as before; the outer shared_ptrs are not shared among resources handles).
There was a problem hiding this comment.
Okay it's probably an overreach to make that distinction. Thanks for elaborating.
|
Note that rapidsai/cuvs#1796 is blocked. https://nvbugspro.nvidia.com/bug/5356084 is the existing issue for the 12.9 compiler issue. Nothing we can do as the fix was merged into 13.1. |
|
/ok to test |
Make the
raft::resourcesresource initialization semantics more predictable.raft::resourceshandle construction: all copies of theraft::resourceshandle point to the same resources (not a breaking change, fixes the re-initialization issue).set_resourcechanges to non-const semantics (breaking change).Before: all
set_xxxresource-updating calls were operating on const handleNow: all
set_xxxresources require a non-const handle.Thread-safety
First and foremost, thread-safety of using a specific resource depends on that resource. Here we discuss the thread-safety of using
raft::resourceshandle itself (get_resource and set_resource functions).Accessing the same resource by const ref
Updates (set_resource) are not possible (a user should copy a handle and modify the new one). All concurrent get_resource calls are atomic and safe, even if they initialize the factories and resources under the hood. The worst can happen is the same resource being initialized concurrently in two threads but only one being stored in the handle (the other one is discarded).
Accessing the same resource by non-const ref
Using the same object by non-const ref from multiple threads is always unsafe.
Accessing copies of the same resource
The resources and factories are updated atomically. Modifying any resource doesn't propagate to the copies. Accessing a resource while another threads access or modifies the same resource via another handle is thus safe.
Implementation details
The PR adds one more layer of indirection (one extra shared_ptr) to each resource, which may cause an extra runtime overhead. This is unavoidable if we want to allow lazy-initialized resources back-propagate across handles.
On the other hand, the PR removes the handle mutex in favor of C++20
std::atomic<shared_ptr>, which may reduce the runtime overheads a little bit.Breaking changes
raft::resourceshandles.std::atomic<shared_ptr>requires enforcing C++20 for all dependent libraries. An alternative would be to put a mutex perraft::resource_cell.