Skip to content

Fix large-message initialization in communication benchmark#1016

Open
pentschev wants to merge 1 commit intorapidsai:mainfrom
pentschev:fix-bench-comm-message-limit
Open

Fix large-message initialization in communication benchmark#1016
pentschev wants to merge 1 commit intorapidsai:mainfrom
pentschev:fix-bench-comm-message-limit

Conversation

@pentschev
Copy link
Copy Markdown
Member

bench_comm could fail while initializing input buffers for per-peer messages at around 8 GiB and larger, before the benchmark itself ran. This change allows large-message all-to-all cases to run correctly.

`bench_comm` could fail while initializing input buffers for per-peer
messages at around 8 GiB and larger, before the benchmark itself ran.
This change allows large-message all-to-all cases to run correctly.
@pentschev pentschev self-assigned this May 6, 2026
@pentschev pentschev requested a review from a team as a code owner May 6, 2026 21:04
@pentschev pentschev added bug Something isn't working non-breaking Introduces a non-breaking change labels May 6, 2026
Comment on lines +31 to +34
RAPIDSMPF_EXPECTS(
nelem <= static_cast<std::size_t>(std::numeric_limits<index_t>::max()),
"random_device_vector size exceeds signed iterator range"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Do we consider 32-bit systems? For 64 bit nelem > INT64_MAX seems unlikely isnt it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use safe_cast()

}

std::unique_ptr<cudf::column> random_column(
cudf::size_type nrows,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this be std::size_t now to reflect the above change? Otherwise we will be clipping values above UINT32_MAX, isnt it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't make a column with more than size_type::max rows, so I think the point is moot?

Comment on lines -75 to +90
buffer.size / sizeof(std::int32_t) + sizeof(std::int32_t),
(buffer.size + sizeof(std::int32_t) - 1) / sizeof(std::int32_t),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if buffer is empty now, we will be passing 0 to random_device_vector. I think random_fill was trying to prevent that before? I'm not sure.


cudf::table random_table(
cudf::size_type ncolumns,
cudf::size_type nrows,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants