Topology-aware default pool sizing for `PinnedMemoryResource` by nirandaperera · Pull Request #1012 · rapidsai/rapidsmpf

nirandaperera · 2026-05-04T21:39:51Z

Replace the hardcoded pinned_initial_pool_size = 0 / pinned_max_pool_size = unbounded
defaults with values scaled to the system's GPU count.

Changes

get_host_memory_per_gpu() — new utility in system_info that returns
current numa node total host memory/ N GPUs in current numa node
PinnedMemoryResource — two new named constants drive the defaults:
- DefaultInitiPoolSizeFactor = "10%" → initial pool = 10% of per-GPU host memory
- DefaultMaxPoolSizeFactor = "80%" → max pool = 80% of per-GPU host memory
Also corrects the documented default from false to true.
Python bindings — exposes get_host_memory_per_gpu() via system_info.pyx/.pyi.
Tests — PinnedResource.from_default_options verifies pool sizes match the
factor-scaled values when no options are provided.
Benchmark -- Benchmark initialization time against initial pool size - Results
Docs — configuration.md updated to reflect the new defaults.

Signed-off-by: niranda perera <niranda.perera@gmail.com>

pentschev · 2026-05-05T10:13:36Z

+
+std::uint64_t get_host_memory_per_gpu() {
+    auto const num_gpus = get_topology().num_gpus;
+    return get_total_host_memory() / std::max<std::uint64_t>(1, num_gpus);


Can you double-check how get_total_host_memory behaves on devices where GPU memory is also exposed as NUMA nodes? In those cases free will also show GPU memory as part of system memory, which may break the assumption get_total_host_memory presents only system memory.

IIRC, the conditions for GPU memory to be exposed as system memory are:

Coherent system (e.g., Grace-Hopper/Grace-Blackwell)

NVIDIA driver in NUMA mode (not CDMM)

HMM enabled on the kernel

Ah! this is a good point.

@pentschev you were right Peter.

Nice that there was a simple solution. I also didn't think of the bigger picture of the problem, we cannot just take the whole host memory and divide equally among GPUs, since their NUMA nodes may have different amount of memory, and looks like your new solution already ensures that.

Signed-off-by: niranda perera <niranda.perera@gmail.com>

madsbk

Looks good, I only have minor stuff

Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>

Signed-off-by: niranda perera <niranda.perera@gmail.com>

…_init_max_size

nirandaperera · 2026-05-06T14:46:49Z

/merge

Signed-off-by: niranda perera <niranda.perera@gmail.com>

pentschev

Please update the description to reflect recent changes @nirandaperera .

Temporarily blocking to ensure PR and Python get_host_memory_per_gpu() are updated.

pentschev · 2026-05-06T15:53:21Z

I think there's some partial GH UI breakage, I cannot comment on the files now for some reason , but get_host_memory_per_gpu description in python/rapidsmpf/rapidsmpf/utils/system_info.pyx is outdated, @nirandaperera please update with NUMA info.

Signed-off-by: niranda perera <niranda.perera@gmail.com>

…_init_max_size

Signed-off-by: niranda perera <niranda.perera@gmail.com>

…_init_max_size

Signed-off-by: niranda perera <niranda.perera@gmail.com>

…_init_max_size

pentschev · 2026-05-07T08:51:48Z


 using rapidsmpf::safe_cast;

+// When the RAPIDSMPF_SMOKE_TEST_MODE env var is set (to any value), each


This is a bad pattern. One will almost definitely one day attempt RAPIDSMPF_SMOKE_TEST_MODE=0 and will be confused. A better approach would be adding a CLI argument, like all other bench_* binaries we have.

@pentschev Well, this is a bit tricky. Passing custom args might not work here, because in google benchmarks, Apply(...) runs during static initialization. So, to make it work, we have to parse args before registering benchmarks. That means, we need a custom main, parse smoke-test arg, and then move all bench registrations after that. I'd rather keep this simple like this.

pentschev

Thanks Niranda.

pentschev · 2026-05-07T18:56:45Z

+// We use an env var rather than a CLI flag because google-benchmark's
+// BENCHMARK(...)->Apply(...) macros run during static initialization, before
+// main() has a chance to parse argv. A CLI-flag approach would require moving
+// every benchmark registration into main() (via benchmark::RegisterBenchmark),
+// which is more invasive. std::getenv works fine during static init.


Whelp, this is lame. 😞

nirandaperera · 2026-05-07T21:14:19Z

/merge

jameslamb

approving for ci-codeowners / packaging-codeowners, the changes that pulled in those groups look non-controversial.

nirandaperera added 2 commits May 4, 2026 14:36

add pinned pool init and max sizes

3786047

Signed-off-by: niranda perera <niranda.perera@gmail.com>

precommit

e8782c4

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera requested review from a team as code owners May 4, 2026 21:39

nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 4, 2026

remove unnecessary cpp file from bootstrap tests

7bbd171

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera force-pushed the pinned_pool_init_max_size branch from ce335cb to 7bbd171 Compare May 5, 2026 05:39

nirandaperera requested a review from a team as a code owner May 5, 2026 05:39

madsbk reviewed May 5, 2026

View reviewed changes

Comment thread cpp/include/rapidsmpf/memory/pinned_memory_resource.hpp Outdated

pentschev reviewed May 5, 2026

View reviewed changes

nirandaperera added 3 commits May 5, 2026 13:27

change host mem calc logic

5e448ca

Signed-off-by: niranda perera <niranda.perera@gmail.com>

algorithm

a97925b

Signed-off-by: niranda perera <niranda.perera@gmail.com>

algorithm

097a30c

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera requested a review from pentschev May 5, 2026 20:37

addressing comments

501186e

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera requested a review from madsbk May 5, 2026 21:12

madsbk approved these changes May 6, 2026

View reviewed changes

nirandaperera and others added 3 commits May 6, 2026 07:40

Apply suggestion from @madsbk

fe59b38

Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>

addressing comments

4d03100

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Merge branch 'main' of github.com:rapidsai/rapidsmpf into pinned_pool…

9cf3d57

…_init_max_size

nirandaperera added 2 commits May 6, 2026 08:07

harden get_topology

96c63fa

Signed-off-by: niranda perera <niranda.perera@gmail.com>

update docs

082ee2d

Signed-off-by: niranda perera <niranda.perera@gmail.com>

pentschev reviewed May 6, 2026

View reviewed changes

Comment thread cpp/include/rapidsmpf/system_info.hpp

pentschev requested changes May 6, 2026

View reviewed changes

addressing comments

cc3a27e

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera requested a review from pentschev May 6, 2026 16:25

adding bench

e41a61f

Signed-off-by: niranda perera <niranda.perera@gmail.com>

nirandaperera added 7 commits May 6, 2026 10:11

precommit

08f6e8e

Signed-off-by: niranda perera <niranda.perera@gmail.com>

minor

0f12758

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Merge branch 'main' of github.com:rapidsai/rapidsmpf into pinned_pool…

10a0a15

…_init_max_size

fix numa affinity issue

e9f4cd9

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Merge branch 'main' of github.com:rapidsai/rapidsmpf into pinned_pool…

f651ff9

…_init_max_size

reduce some test cases

45c9841

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Merge branch 'main' of github.com:rapidsai/rapidsmpf into pinned_pool…

4a0a737

…_init_max_size

nirandaperera requested a review from a team as a code owner May 6, 2026 23:07

nirandaperera requested a review from bdice May 6, 2026 23:07

pentschev reviewed May 7, 2026

View reviewed changes

nirandaperera requested a review from pentschev May 7, 2026 16:30

minor change

7a9e349

pentschev approved these changes May 7, 2026

View reviewed changes

jameslamb approved these changes May 7, 2026

View reviewed changes

rapids-bot Bot merged commit b72e232 into rapidsai:main May 7, 2026
127 of 129 checks passed


		using rapidsmpf::safe_cast;

		// When the RAPIDSMPF_SMOKE_TEST_MODE env var is set (to any value), each

Conversation

nirandaperera commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Uh oh!

pentschev May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nirandaperera May 5, 2026

Choose a reason for hiding this comment

Uh oh!

nirandaperera May 5, 2026

Choose a reason for hiding this comment

Uh oh!

pentschev May 5, 2026

Choose a reason for hiding this comment

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nirandaperera commented May 6, 2026

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pentschev commented May 6, 2026

Uh oh!

pentschev May 7, 2026

Choose a reason for hiding this comment

Uh oh!

nirandaperera May 7, 2026

Choose a reason for hiding this comment

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

pentschev May 7, 2026

Choose a reason for hiding this comment

Uh oh!

nirandaperera commented May 7, 2026

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nirandaperera commented May 4, 2026 •

edited

Loading