Skip to content

Harden ConvTranspose pad computation with SafeInt and consistency checks#29446

Merged
yuslepukhin merged 4 commits into
mainfrom
yuslepukhin/compute_total_pad
Jul 1, 2026
Merged

Harden ConvTranspose pad computation with SafeInt and consistency checks#29446
yuslepukhin merged 4 commits into
mainfrom
yuslepukhin/compute_total_pad

Conversation

@yuslepukhin

Copy link
Copy Markdown
Contributor

This pull request strengthens validation and error handling for the ConvTranspose operator in ONNX Runtime, particularly when using explicit output_shape attributes. It adds comprehensive checks to prevent inconsistent or invalid configurations, improves arithmetic safety to guard against integer overflows, and introduces a suite of targeted unit tests to verify these behaviors.

Validation and Error Handling Improvements:

  • Added stricter input validation in ConvTransposeAttributes::ComputePadAndOutputShape to ensure that all relevant parameters (output_shape, input size, stride, kernel, dilation, and output padding) are within valid ranges and to provide clear error messages when they are not. This includes checks that all values are positive and that output padding is non-negative.
  • Added a consistency check to verify that the explicit output_shape is compatible with the input dimensions and convolution parameters, preventing buffer overruns and logical inconsistencies.

Arithmetic Safety:

  • Updated ComputeTotalPad to use SafeInt for all intermediate arithmetic, ensuring that integer overflows are detected and handled safely instead of producing undefined behavior.

Testing Enhancements:

  • Added a comprehensive set of unit tests for ConvTranspose with explicit output_shape, including cases for invalid, inconsistent, and overflow-prone configurations, as well as valid edge cases (e.g., 1D, 2D, and 3D, large batch sizes, group > 1, and cases requiring padding). These tests verify that invalid configurations are rejected and that valid ones work as expected.

These changes collectively improve the robustness, correctness, and maintainability of the ConvTranspose operator's implementation and its handling of explicit output shapes.

- Wrap ComputeTotalPad arithmetic in SafeInt<int64_t> to prevent signed
  integer overflow (matching ComputePad and ComputeOutputShape in the
  same header).
- Add input validation in the explicit output_shape path of
  ComputeTransposePadAndOutputShape (stride, kernel, dilation, adj,
  in_size, out_size).
- Add a post-computation consistency check verifying that the
  forward-conv re-derivation of input size matches the actual input
  size, preventing pad/buffer size mismatches.
- Add unit tests covering: inconsistent output_shape (1D, 2D, 3D),
  zero output_shape, stride/dilation variants, group>1, large batch,
  valid output_shape requiring padding, and arithmetic overflow (guarded
  by ORT_NO_EXCEPTIONS).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

The OverflowInPadComputation test was crashing during AddInput because
TensorShape::Size() overflowed for the huge W dimensions. Fix by using
a valid small W shape {1,1,3,3} with kernel_shape={3,3} attribute and
extreme dilation values that trigger SafeInt overflow in the pad
computation arithmetic instead.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread onnxruntime/core/providers/cpu/nn/conv_transpose_attributes.h
Comment thread onnxruntime/test/providers/cpu/nn/conv_transpose_op_test.cc
@titaiwangms

Copy link
Copy Markdown
Contributor

Review summary — ConvTranspose explicit output_shape hardening

Bottom line: the core change is sound and a genuine safety improvement. The central question — does the new derived_in != in_size check falsely reject spec-legal configs? — resolves in the PR's favor. No blocking issues. The most valuable finding is a pre-existing OOB that this PR leaves unfixed on a sibling code path (out of scope here, but worth a follow-up).

The crux, verified

Col2im (in conv_transpose.cc) is driven by the output dims and consumes exactly derived_in = (out + pads − dkernel)/stride + 1 columns from a buffer sized for in_size. So derived_in > in_size is a real heap over-read, not a cosmetic mismatch.

Algebra: derived_in = in + floor(adj/stride).

  • adj < stride → passes for the entire legal auto-pad range (out ≤ natural). ✔ No false rejection.
  • Rejected cases are exactly out > natural (spec-illegal — implies negative pads) and adj ≥ stride (only reachable when dilation > stride). In both, the existing Col2im would have over-read — so no correct behavior is lost; the check converts a latent OOB into a clean Status. ONNX spec prose is ambiguous here ("less than the corresponding stride/dilation"), and the onnx reference enforces no such bound, so the fail-safe is defensible.

Findings

🔴 Critical — pre-existing & out-of-scope (does not block this PR)

  • The same Col2im OOB survives on the implicit path (no output_shape, via output_padding). Reachable example: X[1,1,3,1], strides=[2,1], dilations=[3,1], kernel=[3,1], output_padding=[2,0] passes the adj < max(stride,dilation) pre-check → out=13, Col2im wants 4 columns from a 3-column buffer → over-read. This PR only guards the explicit-output_shape branch. Recommend extending the same derived_in guard (or tightening to adj < stride) to the implicit path in this or a follow-up PR.

🟡 Minor (in-scope, non-blocking)

  • Misleading diagnostic: when adj ≥ stride, the error blames output_shape though the real culprit is the output_padding×dilation interaction. Consider an explicit adj < stride guard or reworded message.
  • Missing positive test for valid explicit output_shape with output_padding > 0 — all new success tests use adj=0, so the adj term in the new check is untested on the success side.
  • Overflow tests use an empty expected-error substring ("") → they can pass on unrelated failures; assert a stable substring.
  • Test magic numbers contradict their comments (4611686018427387903 is INT64_MAX/2, not /4; 2305843009213693952 is 2^61). Prefer named constexpr expressions.
  • Comment vs. code: the comment describes a one-sided "exceeds" condition but the code tests !=. Harmless (equivalent in the reachable space) — a half-line note that derived_in < in_size is unreachable for adj ≥ 0 would prevent a future reader from suspecting over-strict rejection.
  • Failure tests exclude CUDA/WebGPU with a comment claiming they "perform their own validation," but they share ComputePadsAndOutputShape and would hit the same check — likely a coverage gap + inaccurate comment (worth confirming the routing).

🔵 Follow-up / out-of-scope (pre-existing)

  • WebNN (builder_utils.cc) and QNN (conv_op_builder.cc) do their own raw arithmetic / attribute parsing, bypassing both the new SafeInt protection (WebNN's in_size*stride can still overflow) and the consistency check. Consider unifying through ComputeTransposePadAndOutputShape.

✅ Confirmed clean

  • Dropping constexpr from ComputeTotalPad: no caller uses it in a constant-expression context. No build break.

Reviewed with a multi-model review pass (readability / code / critical / deep-spec / integration); crux verified by hand against the kernel, the ONNX 1.22 schema, and the onnx reference implementation.

- Add derived_in consistency check to the implicit path (no output_shape)
  to catch output_padding >= stride when dilation > stride, preventing
  the same Col2im OOB read on the implicit code path.
- Add boundary test (output_shape=9, natural=7) for stride=2 explicit
  output_shape with explanation of integer division truncation at 8.
- Add implicit path OOB test: output_padding=2 >= stride=2 with
  dilation=3 passes adj < max(stride, dilation) but fails consistency.
- Add output_padding success test reusing known-correct values from
  ConvTranspose_2D_outputpadding_strides2.
- Fix diagnostic: use output_padding instead of adj in error messages.
- Add comment that derived_in < in_size is algebraically unreachable.
- Use named constexpr for magic numbers in overflow tests.
- Assert stable 'Integer overflow' substring instead of empty string.
- Fix EP exclusion comments to accurately describe routing.
titaiwangms
titaiwangms previously approved these changes Jun 30, 2026

@titaiwangms titaiwangms left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

DML does not support 3D ConvTranspose and throws 'The parameter is
incorrect' during session initialization.
@yuslepukhin yuslepukhin merged commit 6f31cd6 into main Jul 1, 2026
86 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/compute_total_pad branch July 1, 2026 17:33
feich-ms pushed a commit that referenced this pull request Jul 3, 2026
…cks (#29446)

This pull request strengthens validation and error handling for the
ConvTranspose operator in ONNX Runtime, particularly when using explicit
`output_shape` attributes. It adds comprehensive checks to prevent
inconsistent or invalid configurations, improves arithmetic safety to
guard against integer overflows, and introduces a suite of targeted unit
tests to verify these behaviors.

**Validation and Error Handling Improvements:**

* Added stricter input validation in
`ConvTransposeAttributes::ComputePadAndOutputShape` to ensure that all
relevant parameters (`output_shape`, input size, stride, kernel,
dilation, and output padding) are within valid ranges and to provide
clear error messages when they are not. This includes checks that all
values are positive and that output padding is non-negative.
* Added a consistency check to verify that the explicit `output_shape`
is compatible with the input dimensions and convolution parameters,
preventing buffer overruns and logical inconsistencies.

**Arithmetic Safety:**

* Updated `ComputeTotalPad` to use `SafeInt` for all intermediate
arithmetic, ensuring that integer overflows are detected and handled
safely instead of producing undefined behavior.

**Testing Enhancements:**

* Added a comprehensive set of unit tests for `ConvTranspose` with
explicit `output_shape`, including cases for invalid, inconsistent, and
overflow-prone configurations, as well as valid edge cases (e.g., 1D,
2D, and 3D, large batch sizes, group > 1, and cases requiring padding).
These tests verify that invalid configurations are rejected and that
valid ones work as expected.

These changes collectively improve the robustness, correctness, and
maintainability of the ConvTranspose operator's implementation and its
handling of explicit output shapes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants