Skip to content

Fix GQA out-of-bounds read in past KV buffer (CPU)#29447

Open
tianleiwu wants to merge 2 commits into
microsoft:mainfrom
tianleiwu:tlwu/20260630/icm_gqa_oob
Open

Fix GQA out-of-bounds read in past KV buffer (CPU)#29447
tianleiwu wants to merge 2 commits into
microsoft:mainfrom
tianleiwu:tlwu/20260630/icm_gqa_oob

Conversation

@tianleiwu

Copy link
Copy Markdown
Contributor

Description

Fixes an out-of-bounds (OOB) read in the CPU GroupQueryAttention (GQA) operator. During token generation (decode), ConcatStateChunkGQA copies seqlens_k + 1 - sequence_length rows out of the past key/value buffers. The existing validation only bounded the present buffer write (seqlens_k < present_kv_seqlen), but the present buffer can be larger than the past buffer when total_sequence_length exceeds the past sequence length. A large seqlens_k combined with a small past buffer therefore read past the end of the past key/value tensors.

Motivation and Context

present_kv_seqlen = max(total_sequence_length, past_sequence_length), so the pre-existing seqlens_k < present_kv_seqlen check does not bound the past-side read when total_sequence_length > past_sequence_length. With a crafted seqlens_k, the decode path in ConcatStateChunkGQA reads seqlens_k + 1 - sequence_length rows from the smaller past buffer, causing an OOB read.

Key Changes

File Change
onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc Add a per-batch bound in the seqlens_k validation block: for the decode case (past_kv_seqlen > 0 && kv_sequence_length != 0 && !is_first_prompt), reject when seqlens_k + 1 - sequence_length > past_kv_seqlen.
onnxruntime/test/contrib_ops/group_query_attention_op_test.cc Add regression test SeqlensKExceedsPastBuffer_OOBRead exercising a large seqlens_k against a small past buffer.

Shared KV (kv_sequence_length == 0) appends no new KV and its past read is already bounded by the present-buffer check together with the total_sequence_length <= seqlen_past_kv_cache enforcement in the apply-attention paths, so it needs no additional check.

Testing Notes

  • Build and run the GQA tests:
    cmake --build build/Linux/Debug --target onnxruntime_provider_test -j$(nproc)
    ./build/Linux/Debug/onnxruntime_provider_test --gtest_filter="*GroupQueryAttention*"
    
  • All 41 CPU GQA tests pass, including the new SeqlensKExceedsPastBuffer_OOBRead regression and the existing shared-KV CPU cases.
  • lintrunner reports no issues on the changed files.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a security/correctness issue in the CPU GroupQueryAttention contrib op where seqlens_k validation did not fully bound reads from the past KV cache during non-prompt execution, potentially leading to an out-of-bounds read. It adds an additional bounds check for the number of past rows copied and introduces a regression test covering the reported scenario.

Changes:

  • Add a decode-path validation to ensure (seqlens_k + 1 - sequence_length) does not exceed the past KV buffer sequence length.
  • Add a regression test (SeqlensKExceedsPastBuffer_OOBRead) to exercise the large-seqlens_k / small-past-buffer case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc Adds past-buffer bounds validation for decode to prevent past KV OOB reads.
onnxruntime/test/contrib_ops/group_query_attention_op_test.cc Adds a regression test to ensure invalid seqlens_k is rejected when it would over-read past KV.

// Shared KV (kv_sequence_length == 0) appends no new KV and its past read is already
// bounded by the present-buffer check together with the total_sequence_length <=
// seqlen_past_kv_cache enforcement in the apply-attention paths, so it needs no check here.
if (past_kv_seqlen > 0 && parameters.kv_sequence_length != 0 && !parameters.is_first_prompt) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants