Skip to content

geotiff: cap VRT XML read size#1818

Merged
brendancol merged 2 commits into
mainfrom
issue-1815
May 13, 2026
Merged

geotiff: cap VRT XML read size#1818
brendancol merged 2 commits into
mainfrom
issue-1815

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1815.

Adds a 64 MiB default cap on VRT XML reads, configurable via XRSPATIAL_VRT_MAX_XML_BYTES. Streams the file with a running size check and raises ValueError when the cap is exceeded.

Tested: pytest xrspatial/geotiff/tests/test_vrt_xml_size_cap_1815.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 16:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable byte cap (default 64 MiB, via XRSPATIAL_VRT_MAX_XML_BYTES) on VRT XML reads in _vrt.read_vrt to prevent memory exhaustion from pathologically large VRT XML files, addressing issue #1815. Reading is now streamed in 64 KiB chunks with a running size check that raises ValueError when exceeded.

Changes:

  • Add _get_vrt_max_xml_bytes (env-driven cap with validation) and _read_vrt_xml (bounded streaming reader) helpers in _vrt.py.
  • Replace unbounded f.read() in read_vrt with the new bounded reader.
  • Add regression tests covering under-cap success, over-cap failure, env-var override, and invalid cap values.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_vrt.py Introduces env-configurable XML size cap and streaming reader used by read_vrt.
xrspatial/geotiff/tests/test_vrt_xml_size_cap_1815.py New tests verifying cap behavior, env override, and invalid-cap handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
@brendancol brendancol merged commit 4b1de37 into main May 13, 2026
10 of 11 checks passed
brendancol added a commit that referenced this pull request May 14, 2026
…1832)

PR #1818 capped VRT XML reads at 64 MiB (configurable via
XRSPATIAL_VRT_MAX_XML_BYTES) in xrspatial/geotiff/_vrt.py::_read_vrt_xml
to keep a multi-GB attacker-supplied VRT file from exhausting memory at
parse time.

The chunked dispatcher merged in #1822 (_read_vrt_chunked in
xrspatial/geotiff/__init__.py) parses the VRT independently with a
bare open().read() before calling parse_vrt, so the cap did not apply
to read_vrt(path, chunks=...). The per-task _vrt_chunk_read worker
calls _read_vrt_internal (which goes through _read_vrt_xml), so only
the parent dispatch path was uncapped.

Route the chunked-path XML read through _read_vrt_xml so the same cap
and env-var override apply on both eager and chunked entry points.

Regression test pins the small-cap rejection, the default-cap success,
and the raised-cap success for chunks=.
@brendancol brendancol deleted the issue-1815 branch May 15, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: VRT XML parser reads file without a size cap

2 participants