Skip to content

Fix signed-int overflow in SamplingState::Init to prevent heap-buffer-overflow#29443

Open
apsonawane wants to merge 5 commits into
mainfrom
asonawane/edge-1
Open

Fix signed-int overflow in SamplingState::Init to prevent heap-buffer-overflow#29443
apsonawane wants to merge 5 commits into
mainfrom
asonawane/edge-1

Conversation

@apsonawane

Copy link
Copy Markdown
Contributor

This pull request improves the safety of buffer size calculations in the SamplingState initialization logic by ensuring that all multiplications involving batch_size and vocab_size are safely performed using SafeInt<size_t>. This prevents potential integer overflow bugs that could lead to under-allocated buffers and memory errors.

Buffer allocation safety improvements:

  • All buffer size calculations that multiply batch_size and vocab_size now use SafeInt<size_t> to ensure checked arithmetic, preventing silent integer overflows that could cause heap-buffer-overflow issues. This includes allocations for both CPU and CUDA buffers in SamplingState. [1] [2]
  • The calculation for the buffer size of h_sampled_all now also safely casts max_iter to size_t before multiplication, further protecting against overflow.

These changes make the code more robust and secure, especially when handling large or model-controlled input sizes.

…erflow

SamplingState<T>::Init computed int total_count = batch_size * vocab_size as a bare int*int multiply with model-controlled operands, then wrapped the already-overflowed result in SafeInt<size_t>. SafeInt rejected the negative-wrap case but silently accepted positive-wrap (e.g. 4 * 0x40000001 wraps to 4), under-sizing sorted_scores / cumulative_probs. The companion next_token_scores buffer sizes the same product correctly via SafeInt<size_t>(batch_size) * vocab_size, so the later memcpy in SamplingCpuHelper::Sample copies the large size into the small buffer -- a heap-buffer-overflow WRITE triggerable by a hostile .onnx model with a com.microsoft::Sampling node.

Fix: compute the product in SafeInt's checked domain by casting an operand first, matching the pattern already used for next_token_scores. Apply the same operand-first pattern to the batch_size * max_iter site and to SafeInt<size_t>(batch_size + 1) (which itself could wrap in int).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens SamplingState::Init in the generation/transformers greedy-search implementation by moving buffer element-count computations into SafeInt<size_t> so integer overflow can’t lead to under-allocation and downstream memory errors.

Changes:

  • Compute batch_size * vocab_size using SafeInt<size_t> to prevent overflow before buffer allocation.
  • Reuse the checked total_count across CPU/CUDA allocations in SamplingState.

Comment thread onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_base.h Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants