Skip to content

geotiff: read minimal source window for VRT resample sources#1821

Merged
brendancol merged 1 commit into
mainfrom
issue-1704
May 13, 2026
Merged

geotiff: read minimal source window for VRT resample sources#1821
brendancol merged 1 commit into
mainfrom
issue-1704

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1704.

Computes the inverse of the nearest-neighbour mapping from the clipped destination window back to the minimal source sub-window, then reads only that sub-window. Cuts decode and resample-intermediate memory from full-SrcRect-sized to user-window-sized.

The new _resample_nearest_window helper preserves the per-pixel mapping used by the existing full-rect path, so output for the windowed case is byte-identical to a full read sliced after the fact. Regression tests in test_vrt_resample_window_inverse_1704.py lock that contract in across upsample, downsample, and non-integer ratios.

The per-source pixel-budget guard from #1737 now applies to the clipped sub-window rather than the raw DstRect. The huge-DstRect attack vector is already neutralised by the windowed read; the guard is retained as defence in depth. The existing #1737 tests are updated to reflect this.

Tested: pytest xrspatial/geotiff/tests/test_vrt_resample_window_inverse_1704.py

…ple (closes #1704)

Previously when a VRT SimpleSource had a SrcRect/DstRect size mismatch and the
caller passed a small ``window=``, ``read_vrt`` decoded the full SrcRect from
disk and built a full DstRect-sized resample intermediate before slicing out
the clip subregion. For large source rects this forced multi-gigabyte decodes
and intermediates on tiny output windows.

Now the resample path inverts the nearest-neighbour mapping: for the clipped
destination sub-window it computes the smallest SrcRect-relative range of rows
and columns that ``_resample_nearest`` would gather from, reads only that
sub-rect, and resamples directly into the sub-window output. The new
``_resample_nearest_window`` helper applies the same per-pixel
``floor((d + 0.5) * src / dst)`` mapping the full-rect path uses, so output
is byte-identical to a full read sliced after the fact.

The per-source pixel-budget guard from #1737 now applies to the clipped
sub-window rather than the raw DstRect. The huge-DstRect attack vector is
already neutralised by the windowed read; the guard is retained as defence
in depth.

Regression coverage in test_vrt_resample_window_inverse_1704.py: byte-identical
parity for 4x upsample, 4x downsample, and the 7-to-11 non-integer ratio across
several window offsets; edge-aligned windows at origin and last pixel; a
window crossing two SimpleSources; nodata round-trip; and a read-bound check
asserting the source read shrinks under the new path.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 16:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes read_vrt so that VRT SimpleSources requiring nearest-neighbour resampling (mismatched SrcRect/DstRect sizes) decode only the minimal source sub-rect that feeds the caller's clipped destination window, instead of the full SrcRect. The per-source pixel-budget guard from #1737 is rescoped to the clipped sub-window since the windowed read already neutralizes the huge-DstRect DoS vector.

Changes:

  • Add _nn_src_index and _resample_nearest_window helpers that invert the existing nearest-neighbour mapping for arbitrary sub-windows while remaining byte-identical to "resample full, then slice."
  • Replace the full-SrcRect read in the needs_resample branch of read_vrt with a windowed read driven by the inverse mapping; rescope the max_pixels resample-intermediate guard to the clipped sub-window.
  • Update issue #1737 regression tests to reflect new bounded-by-window semantics and add new #1704 regression tests asserting parity and bounded reads.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_vrt.py New _nn_src_index/_resample_nearest_window helpers; needs_resample branch now reads minimal source sub-rect and resamples directly into sub-window; per-source pixel-budget cap measured against sub-window.
xrspatial/geotiff/tests/test_vrt_resample_window_inverse_1704.py New regression suite locking byte-identical parity with full-then-slice across upsample/downsample/non-integer ratios, edge alignment, multi-source clipping, nodata round-trip, and a read-bound assertion via read_to_array spy.
xrspatial/geotiff/tests/test_vrt_dstrect_resample_cap_1737.py Updates #1737 tests for new clipped-sub-window cap semantics: huge DstRect no longer pre-rejected; per-source cap now exercised by oversized sub-windows; non-tiled source write to avoid tile-size collision with small max_pixels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@brendancol brendancol merged commit b2ef350 into main May 13, 2026
16 checks passed
Copilot AI added a commit that referenced this pull request May 13, 2026
Keep _read_vrt_chunked dispatch (handles gpu=True + chunks=) over the
non-GPU-capable _read_vrt_dask added in #1807. Remove the now-dead
_read_vrt_dask and _vrt_effective_dtype functions that were only
reachable via the superseded dispatch branch.

Auto-merged from main: _vrt.py (VRT resample window inverse #1704 +
XML size cap #1815 + minimal source window #1821), test files
test_read_vrt_lazy_chunks_1798.py, test_vrt_dstrect_resample_cap_1737.py,
test_vrt_resample_window_inverse_1704.py, test_vrt_xml_size_cap_1815.py.

Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize read_vrt window for needs_resample sources

2 participants