Skip to content

Fix off-by-one num_segments in matrix::sort_cols_per_row#3010

Open
viclafargue wants to merge 1 commit intorapidsai:mainfrom
viclafargue:fix-sort-columns-per-row
Open

Fix off-by-one num_segments in matrix::sort_cols_per_row#3010
viclafargue wants to merge 1 commit intorapidsai:mainfrom
viclafargue:fix-sort-columns-per-row

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue commented May 4, 2026

Closes #2049

raft::matrix::detail::sortColumnsPerRow passed n_rows + 1 to cub as num_segments. Per the cub contract, an aliased CSR offsets array must have length num_segments + 1, so cub read one int past the offsets allocation. This crashes with cudaErrorIllegalAddress when n_rows + 1 is a power of two ≥ 64 and n_columns ≥ ~16384 (silent otherwise).

Surfaces in cuvs::stats::trustworthiness_score whenever n % batch_size ∈ {63, 127, 255, …} (e.g. n=76927, batch_size=512).

Fix : pass n_rows to cub; size the offsets array as n_rows + 1.

@viclafargue viclafargue self-assigned this May 4, 2026
@viclafargue viclafargue requested a review from a team as a code owner May 4, 2026 13:25
@viclafargue viclafargue added bug Something isn't working non-breaking Non-breaking change labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] cuml.metrics.trustworthiness crashes with cudaErrorIllegalAddress for specific values of n

2 participants