-
Notifications
You must be signed in to change notification settings - Fork 231
Dry Run Protocol #2961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
achirkin
wants to merge
86
commits into
rapidsai:main
Choose a base branch
from
achirkin:fea-dry-run-protocol
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Dry Run Protocol #2961
Changes from all commits
Commits
Show all changes
86 commits
Select commit
Hold shift + click to select a range
99dc47f
Add dry-run memory resources for allocation profiling without real me…
achirkin 695a8a3
First batch of dry-run guards
achirkin 42d8ad4
Dry run compliance for raft::linalg namespace
achirkin 6db7ec8
Update developer guide with the dry run protocol
achirkin d91a1c6
BREAKING CHANGE: replaced pinned_container with host_container using …
achirkin 1a114f6
Dry run compliance for raft::matrix namespace
achirkin dec5e95
Dry run compliance for raft::random namespace
achirkin f84d9a9
Dry run compliance for raft::solver namespace
achirkin 44793cd
Dry run compliance for raft::sparse namespace
achirkin d566fe9
Dry run compliance for raft::spectral namespace
achirkin fc3bde6
Dry run compliance for raft::stats namespace
achirkin b0ddbc8
Add a little bit more tests
achirkin 15c07a1
Add the Dry Run Protocol Overview
achirkin 1c57abb
Fix C++ example in the docs
achirkin d916b45
Merge branch 'main' into fea-dry-run-protocol
achirkin 9d24480
Add a few more tests and fix a missed CUDA call in QR algorithm
achirkin 7577e56
Fix excess subsample doing work in dry run
achirkin 99faf68
Add dry run compliance to the raft::copy on mdspans
achirkin b859894
Merge branch 'main' into fea-dry-run-protocol
achirkin 57d4c19
Revert changing includes from public to detail namespace to avoid bre…
achirkin 694ec63
Merge branch 'main' into fea-dry-run-protocol
achirkin a2dd18c
Merge rapidsai/main into fea-dry-run-protocol
achirkin 45e2d49
Rename device_uvector_policy -> device_container_policy and add non-i…
achirkin 65d4570
Declare the new resources in raft handle
achirkin d86638f
Renamed managed policy
achirkin d6788f6
Add raft::resources for pinned and managed resources and the type-era…
achirkin e7bea48
Updated container policies
achirkin 2514621
All but host memory resource are done
achirkin 49735a5
Simplify the implementation
achirkin 22b4048
Make the host container policy use the resource concept
achirkin 557cc8c
Settle down with raft::mr::*et_default_host_resource()
achirkin cc7a4b0
Add some thread-safety
achirkin e77fe2a
Merge branch 'main' into fea-unify-memory-resources
achirkin 8922b8f
Merge branch 'main' into fea-dry-run-protocol
achirkin 866211e
C++17 backwards-compatibility
achirkin c171d84
Merge branch 'main' into fea-unify-memory-resources
achirkin 268eb1b
newline
achirkin 5c718d6
Add raft::mr::device_resource wrapper for cuda::mr::any_resource
achirkin c5ab9c4
Copy semantics and return resource refs
achirkin 6af142e
Rework workspace resources to avoid nesting bridge layers
achirkin ece1990
Fix the argument order in tests
achirkin 3c17e3e
Merge branch 'main' into fea-dry-run-protocol
achirkin 4dd256b
Merge branch 'main' into fea-unify-memory-resources
achirkin a26357d
Add explicit conversion through cuda::mr refs to rmm ref
achirkin 2a90680
Switch from rmm host and host_device resource reference wrappers to r…
achirkin 59c3793
Merge branch 'main' into fea-unify-memory-resources
achirkin 3a40d22
Prefer rmm::mr::get_current_device_resource_ref() over rmm::mr::get_c…
achirkin cce4f45
Remove raft pinned and managed memory resources in favor of cuda::mr …
achirkin fb56025
Merge branch 'main' into fea-dry-run-protocol
achirkin ff20962
Merge fea-unify-memory-resources into fea-dry-run-protocol
achirkin e76bf7c
Adapt to fea-unify-memory-resources
achirkin 2d3f8fc
Refactor dry_run_resources as a child of raft::resources to better ke…
achirkin d2cf85e
Merge branch 'main' into fea-dry-run-protocol
achirkin e86b56d
Merge branch 'main' into fea-dry-run-protocol
achirkin d9a0abf
Fix style after merge commit
achirkin 16324fb
Merge branch 'main' into fea-dry-run-protocol
achirkin ced0e6e
Fix merge commit typo
achirkin c9bf618
Merge branch 'main' into fea-dry-run-protocol
achirkin 1acf6cf
Fix some sparse routines not being dry-run compliant
achirkin 2326061
Unify the looks of the three custom raft::resources
achirkin d4ff16e
Expand test coverage Part 1
achirkin fee4b62
Expand test coverage Part 2
achirkin f1b7aca
Update docs to reflect unify memory resources PR changes
achirkin 156a437
Fix segfault in sparse tests caused by invalid thrust exec policy
achirkin 51c0b16
Better allocation estimates in the sparse namespace
achirkin c99b879
Fixing more failing tests
achirkin e535124
Fixing last failing tests
achirkin 9971c71
Merge branch 'main' into fea-dry-run-protocol
achirkin b682d46
Fix not initialize the mdarray scalars only in dry run mode
achirkin 69543a1
Clarify that all workspace resources are actually counted independent…
achirkin 5db5727
Rename the dry_run_resources header file for conistency
achirkin d1cf594
Dry-run compliance for coo_sort
achirkin e47c41f
Fix the expected minimum allocation calculation
achirkin 3d93b0f
Merge branch 'main' into fea-dry-run-protocol
achirkin e60a048
Merge branch 'main' into fea-dry-run-protocol
achirkin f8754d9
Merge branch 'main' into fea-dry-run-protocol
achirkin 0f65503
Merge branch 'main' into fea-dry-run-protocol
achirkin 969b868
Merge remote-tracking branch 'rapidsai/main' into fea-dry-run-protocol
achirkin 217ca58
Fix tests after rmm breaking change
achirkin e72e872
Store the device resources by values to safely keep them alive while …
achirkin 1c4deb8
Switch to owning semantics for both host and per-device resources
achirkin 8fdf194
Don't let allocations cross dry-run/normal scopes
achirkin 1a7501c
More thorough tests for bitset/bitmap in dry run mode
achirkin 65dda3b
Merge branch 'main' into fea-dry-run-protocol
achirkin a710e97
make bitset.count() dry-run-compliant
achirkin 3a65638
Merge branch 'main' into fea-dry-run-protocol
achirkin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
| #pragma once | ||
|
|
||
| #include <raft/core/resource/resource_types.hpp> | ||
| #include <raft/core/resources.hpp> | ||
|
|
||
| #include <memory> | ||
|
|
||
| namespace raft::resource { | ||
|
|
||
| /** | ||
| * @defgroup dry_run_flag Dry-run flag resource | ||
| * @{ | ||
| */ | ||
|
|
||
| /** | ||
| * @brief Resource that holds a boolean dry-run flag. | ||
| * | ||
| * When the dry-run flag is set, algorithms should skip kernel execution | ||
| * and only perform allocations to measure memory usage. | ||
| */ | ||
| class dry_run_flag_resource : public resource { | ||
| public: | ||
| dry_run_flag_resource() = default; | ||
| explicit dry_run_flag_resource(bool value) : flag_(value) {} | ||
| ~dry_run_flag_resource() override = default; | ||
|
|
||
| auto get_resource() -> void* override { return &flag_; } | ||
|
|
||
| void set(bool value) { flag_ = value; } | ||
| [[nodiscard]] auto get() const -> bool { return flag_; } | ||
|
|
||
| private: | ||
| bool flag_{false}; | ||
| }; | ||
|
|
||
| /** | ||
| * @brief Factory that creates a dry_run_flag_resource. | ||
| */ | ||
| class dry_run_flag_resource_factory : public resource_factory { | ||
| public: | ||
| explicit dry_run_flag_resource_factory(bool initial_value = false) : initial_value_(initial_value) | ||
| { | ||
| } | ||
|
|
||
| auto get_resource_type() -> resource_type override { return resource_type::DRY_RUN_FLAG; } | ||
| auto make_resource() -> resource* override { return new dry_run_flag_resource(initial_value_); } | ||
|
|
||
| private: | ||
| bool initial_value_; | ||
| }; | ||
|
|
||
| /** | ||
| * @brief Get the dry-run flag from a resources handle. | ||
| * | ||
| * @param res raft resources object | ||
| * @return true if dry-run mode is active | ||
| */ | ||
| inline auto get_dry_run_flag(resources const& res) -> bool | ||
| { | ||
| if (!res.has_resource_factory(resource_type::DRY_RUN_FLAG)) { | ||
| res.add_resource_factory(std::make_shared<dry_run_flag_resource_factory>()); | ||
| } | ||
| return *res.get_resource<bool>(resource_type::DRY_RUN_FLAG); | ||
| } | ||
|
|
||
| /** | ||
| * @brief Set the dry-run flag on a resources handle. | ||
| * | ||
| * @param res raft resources object | ||
| * @param value true to enable dry-run mode, false to disable | ||
| */ | ||
| inline void set_dry_run_flag(resources const& res, bool value) | ||
| { | ||
| if (!res.has_resource_factory(resource_type::DRY_RUN_FLAG)) { | ||
| res.add_resource_factory(std::make_shared<dry_run_flag_resource_factory>(value)); | ||
| } else { | ||
| // The resource may already be instantiated; update it directly | ||
| auto* flag = res.get_resource<bool>(resource_type::DRY_RUN_FLAG); | ||
| *flag = value; | ||
| } | ||
| } | ||
|
|
||
| /** @} */ | ||
|
|
||
| } // namespace raft::resource |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize we were adding this as a new resource. This would make it hard to use the dry-run for pre-initializing resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that's fine! We can still push the initialized resources back to the original resources handle on destruction of the dry run resources