Skip to content

geotiff: apply nodata mask against post-MinIsWhite sentinel (#1809)#1817

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-13-1778686230
May 13, 2026
Merged

geotiff: apply nodata mask against post-MinIsWhite sentinel (#1809)#1817
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-13-1778686230

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #1809.

MinIsWhite inversion was running before the sentinel-to-NaN nodata mask on every read backend (eager numpy, eager GPU, GPU stripped fallback, dask chunk closure). Because the inversion rewrites the original sentinel value (uint8 nodata=0 -> 255, float32 nodata=-9999 -> 9999), the post-inversion equality check matched the wrong pixels:

  • stored values that equalled the sentinel survived as iinfo.max - sentinel instead of NaN
  • stored values that happened to equal iinfo.max - sentinel were incorrectly turned into NaN

All four backends agreed on the same wrong answer, so backend-parity tests caught nothing.

Fix

Introduce _miniswhite_inverted_nodata() in _reader.py and stash the inverted sentinel on geo_info._mask_nodata. Every backend now routes its sentinel-to-NaN mask through that field, while attrs['nodata'] keeps the original sentinel for write-side round-trip.

The dask graph builder picks up _ifd_photometric and _ifd_samples_per_pixel from _read_geo_info / _parse_cog_http_meta so the closure nodata is inverted at graph-build time rather than per chunk.

Test plan

  • 9 regression tests in test_miniswhite_nodata_1809.py covering:
    • uint8 nodata=0 across eager numpy, dask, and GPU
    • uint16 nodata=65535 (sentinel at dtype max)
    • float32 nodata=-9999
    • no-collision sentinel (existing path still works)
    • no nodata at all (output stays integer dtype)
    • backend-parity assertion across all available backends
  • All 2424 non-stale geotiff tests pass; the 4 pre-existing failures (test_xrs_plot_no_palette, test_plot_geotiff_deprecated, test_tile_size_positive_works, test_dstrect_at_cap_succeeds) reproduce on main and are unrelated.
  • Existing test_miniswhite_backend_parity_1797.py and test_min_is_white_inversion still pass unchanged.

MinIsWhite inversion was running before the sentinel-to-NaN nodata
mask on all four read backends. Because the inversion rewrites the
sentinel value (uint8 nodata=0 -> 255, float32 nodata=-9999 -> 9999),
the post-inversion equality check matched the wrong pixels:

* stored values that equalled the sentinel survived as iinfo.max -
  sentinel instead of NaN
* stored values that happened to equal iinfo.max - sentinel were
  incorrectly turned into NaN

Introduces _miniswhite_inverted_nodata() in _reader.py and stashes the
inverted sentinel on geo_info._mask_nodata. Every backend (eager numpy,
eager GPU, GPU stripped fallback, dask chunk closure) routes its mask
through the new field while attrs['nodata'] keeps the original sentinel
for write-side round-trip. The dask graph builder picks up the IFD
photometric off geo_info via _read_geo_info / _parse_cog_http_meta so
the closure nodata is inverted at graph-build time.

9 regression tests in test_miniswhite_nodata_1809.py cover uint8 with
nodata=0, uint16 with nodata=65535, float32 with nodata=-9999 across
numpy, dask, and GPU backends, plus no-collision and no-nodata controls.

Closes #1809
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 16:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes MinIsWhite GeoTIFF reads so nodata masking compares against the post-inversion sentinel while preserving the original nodata value in attrs for round-trip writes.

Changes:

  • Adds _miniswhite_inverted_nodata() and threads inverted sentinel handling through eager, dask, HTTP COG, and GPU read paths.
  • Preserves original attrs['nodata'] while using an internal mask sentinel for data masking.
  • Adds regression tests for MinIsWhite + nodata behavior across available backends.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_reader.py Computes/stashes MinIsWhite-adjusted nodata sentinel during array reads.
xrspatial/geotiff/__init__.py Uses adjusted sentinel in eager, dask, and GPU nodata masking paths.
xrspatial/geotiff/tests/test_miniswhite_nodata_1809.py Adds regression coverage for MinIsWhite nodata masking cases.
.claude/sweep-accuracy-state.csv Updates sweep state notes for the fixed geotiff issue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


import importlib.util
import os
import tempfile
Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +3394 to +3397
# When MinIsWhite was applied, the mask must use the inverted
# sentinel; otherwise the original sentinel.
_gpu_mask_value = (_mw_mask_nodata if _mw_mask_nodata is not None
else nodata)
Remove unused os/tempfile imports from the MinIsWhite regression test
and route the GPU tiled read's CPU-fallback branches through
GeoInfo._mask_nodata so a sparse-tile, planar=2 auto-fallback, or
final-fallback decode of a MinIsWhite raster masks against the
post-inversion sentinel rather than the original.
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
@brendancol brendancol merged commit 66e36f4 into main May 13, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MinIsWhite inversion runs before nodata mask, swapping data and nodata pixels

2 participants