geotiff: route read_vrt(chunks=) XML read through size cap (#1831)#1832
Merged
Merged
Conversation
PR #1818 capped VRT XML reads at 64 MiB (configurable via XRSPATIAL_VRT_MAX_XML_BYTES) in xrspatial/geotiff/_vrt.py::_read_vrt_xml to keep a multi-GB attacker-supplied VRT file from exhausting memory at parse time. The chunked dispatcher merged in #1822 (_read_vrt_chunked in xrspatial/geotiff/__init__.py) parses the VRT independently with a bare open().read() before calling parse_vrt, so the cap did not apply to read_vrt(path, chunks=...). The per-task _vrt_chunk_read worker calls _read_vrt_internal (which goes through _read_vrt_xml), so only the parent dispatch path was uncapped. Route the chunked-path XML read through _read_vrt_xml so the same cap and env-var override apply on both eager and chunked entry points. Regression test pins the small-cap rejection, the default-cap success, and the raised-cap success for chunks=.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR closes a security regression where read_vrt(..., chunks=...) bypassed the VRT XML size cap introduced in #1818 by performing an unbounded open().read(). It routes the chunked dispatcher’s upfront XML load through _vrt._read_vrt_xml so both eager and chunked entry points consistently enforce XRSPATIAL_VRT_MAX_XML_BYTES.
Changes:
- Update
_read_vrt_chunkedto read XML via_read_vrt_xmlinstead ofopen().read(), applying the same 64 MiB default cap and env-var override. - Add regression tests covering cap rejection, default-cap success, and raised-cap success for the chunked (
chunks=) path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Routes chunked-path VRT XML reading through _read_vrt_xml to enforce the size cap consistently. |
xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py |
Adds regression tests ensuring chunks= honors XRSPATIAL_VRT_MAX_XML_BYTES. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #1818 capped VRT XML reads at 64 MiB (override via
XRSPATIAL_VRT_MAX_XML_BYTES) in_vrt._read_vrt_xmlso an attacker-supplied multi-GB VRT file cannot exhaust host memory at parse time. The lazy chunked dispatcher landed in #1822 (_read_vrt_chunked) parses the VRT independently with a bareopen(source, 'r').read(), bypassing that cap; only the per-task worker reuses the capped reader.This routes the chunked-path XML read through
_read_vrt_xmlso the same cap and env-var override apply on both eager and chunked entry points. Closes #1831.Repro (pre-fix)
After this PR both raise the same
ValueErrorreferencingXRSPATIAL_VRT_MAX_XML_BYTES.Test plan
xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py- new file, 3 tests pinning small-cap rejection, default-cap success, raised-cap success forchunks=.test_vrt_xml_size_cap_1815.py(eager path) still passes.test_vrt_lazy_chunks_1814.py(chunked happy path) still passes.Found by automated security sweep (
deep-sweep-security-geotiff-2026-05-13-s4), Cat 1 (unbounded allocation), severity MEDIUM.