Fix signed-int overflow in SamplingState::Init to prevent heap-buffer-overflow by apsonawane · Pull Request #29443 · microsoft/onnxruntime

apsonawane · 2026-06-30T17:40:18Z

This pull request improves the safety of buffer size calculations in the SamplingState initialization logic by ensuring that all multiplications involving batch_size and vocab_size are safely performed using SafeInt<size_t>. This prevents potential integer overflow bugs that could lead to under-allocated buffers and memory errors.

Buffer allocation safety improvements:

All buffer size calculations that multiply batch_size and vocab_size now use SafeInt<size_t> to ensure checked arithmetic, preventing silent integer overflows that could cause heap-buffer-overflow issues. This includes allocations for both CPU and CUDA buffers in SamplingState. [1] [2]
The calculation for the buffer size of h_sampled_all now also safely casts max_iter to size_t before multiplication, further protecting against overflow.

These changes make the code more robust and secure, especially when handling large or model-controlled input sizes.

…erflow SamplingState<T>::Init computed int total_count = batch_size * vocab_size as a bare int*int multiply with model-controlled operands, then wrapped the already-overflowed result in SafeInt<size_t>. SafeInt rejected the negative-wrap case but silently accepted positive-wrap (e.g. 4 * 0x40000001 wraps to 4), under-sizing sorted_scores / cumulative_probs. The companion next_token_scores buffer sizes the same product correctly via SafeInt<size_t>(batch_size) * vocab_size, so the later memcpy in SamplingCpuHelper::Sample copies the large size into the small buffer -- a heap-buffer-overflow WRITE triggerable by a hostile .onnx model with a com.microsoft::Sampling node. Fix: compute the product in SafeInt's checked domain by casting an operand first, matching the pattern already used for next_token_scores. Apply the same operand-first pattern to the batch_size * max_iter site and to SafeInt<size_t>(batch_size + 1) (which itself could wrap in int).

Copilot

Pull request overview

This PR hardens SamplingState::Init in the generation/transformers greedy-search implementation by moving buffer element-count computations into SafeInt<size_t> so integer overflow can’t lead to under-allocation and downstream memory errors.

Changes:

Compute batch_size * vocab_size using SafeInt<size_t> to prevent overflow before buffer allocation.
Reuse the checked total_count across CPU/CUDA allocations in SamplingState.

tianleiwu

The production change in SamplingState::Init is correct and a genuine hardening win. int total_count = batch_size * vocab_size; is replaced with a checked SafeInt<size_t> product before any value can positive-wrap, and the two remaining in-int sub-expressions (batch_size + 1, batch_size * max_iter) are fixed the same way. Since batch_size (BatchBeamSize()) and vocab_size are model-controlled ints, this is a reachable defensive fix.

One concern on the test file (inline): the added tests re-implement the SafeInt expression inside the test itself and never call the production code, so they cannot fail if someone reverts SamplingState::Init to int * int — which contradicts the test's own comment. They only prove that SafeInt<size_t> throws on negative operands, a property of SafeInt rather than of this PR.

Minor (optional): safeint.h already exposes SafeMul<size_t>(batch_size, vocab_size) which does exactly this multiply; not required since the explicit form matches the existing next_token_scores pattern.

Verdict: COMMENT — production fix looks good; please make the regression test actually exercise the production path (or correct the misleading comment).

tianleiwu · 2026-07-03T00:41:57Z

+//   const SafeInt<size_t> total_count =
+//       SafeInt<size_t>(batch_size) * SafeInt<size_t>(vocab_size);
+size_t ComputeSamplingTotalCount(int batch_size, int vocab_size) {
+  return SafeInt<size_t>(batch_size) * SafeInt<size_t>(vocab_size);


This test is tautological: ComputeSamplingTotalCount is a copy of the SafeInt expression defined here in the test, and it never includes or calls greedy_search_impl_base.h. Reverting the production code to int * int or static_cast<size_t>(...) would leave this test green, which directly contradicts the header comment above ("They will fail if anyone reverts the production code..."). As written, these tests only verify that SafeInt<size_t> multiplication throws on negative operands — a property of SafeInt, not of this change.

Suggestion: extract the size computation into a small function in the production header (e.g. inline SafeInt<size_t> SamplingBufferElementCount(int batch_size, int vocab_size)) and have both Init and this test call it, so a revert genuinely breaks the test. If you keep the mirrored expression, please at least fix the comment so it no longer claims to catch a production revert.

apsonawane added 2 commits June 30, 2026 10:37

Merge branch 'main' into asonawane/edge-1

4b2f583

apsonawane requested a review from Copilot June 30, 2026 17:40

Copilot started reviewing on behalf of apsonawane June 30, 2026 17:40 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_base.h Outdated

apsonawane added 3 commits June 30, 2026 12:40

Fix comments

29aaa4f

Update tests

bf1b5e0

Fix pipeline

89975f1

tianleiwu reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix signed-int overflow in SamplingState::Init to prevent heap-buffer-overflow#29443

Fix signed-int overflow in SamplingState::Init to prevent heap-buffer-overflow#29443
apsonawane wants to merge 5 commits into
mainfrom
asonawane/edge-1

apsonawane commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

tianleiwu Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

apsonawane commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

tianleiwu Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants