Skip to content

refactor: unify as_dataarray; split broadcasting from coords validation#726

Open
FBumann wants to merge 5 commits into
fix/mask-coords-broadcastfrom
feat/unify-as-dataarray-coords
Open

refactor: unify as_dataarray; split broadcasting from coords validation#726
FBumann wants to merge 5 commits into
fix/mask-coords-broadcastfrom
feat/unify-as-dataarray-coords

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 24, 2026

Closes #723. Stacked on #725 (which is stacked on #722).

What changes

as_dataarray_in_coords is folded into as_dataarray and split along the seam between "broadcast arr against coords" and "enforce the coords contract":

  • as_dataarray(arr, coords) — the broadcasting primitive. For every input type, the result is aligned with coords: positional inputs (numpy / unnamed pandas / scalar) are labeled by position, shared-dim coords are reindexed when values are equal in a different order, dims present in coords but not in arr are expanded, and the result is transposed to coords order. Extra dims and disagreeing value sets on shared dims pass through, so xarray broadcasting in expression arithmetic keeps working.
  • assert_compatible_with_coords(arr, coords) — the validation companion. Raises if arr introduces dims not in coords (was: as_dataarray_in_coords's extra-dim raise) or if a shared dim has disagreeing coord values (was: its "do not match" raise).

add_variables and add_constraints now call as_dataarray followed by assert_compatible_with_coords for lower / upper / mask. The previous as_dataarray_in_coords helper is deleted.

_coords_to_dict now filters MultiIndex level coords out of xarray.Coordinates inputs, so the new strict-by-default path treats e.g. station (and not its derived letter / num levels) as the dim. This was already a latent issue once strict semantics governed as_dataarray's coords arg.

Audit summary (the 12 call sites listed in #723)

Call site Strictness Behavior
model.py:705/706 lower/upper in add_variables strict as_dataarray + assert_compatible_with_coords
model.py:715 mask in add_variables strict same
model.py:979 mask in add_constraints strict same
expressions.py:584/613/1105/1668/2154 arithmetic broadcast call signatures unchanged; benefits from positional labeling + reindex for free, with no value-set raise on shared dims
variables.py:330 to_linexpr(coefficient) broadcast unchanged
expressions.py:341/2002/2030/2289, model.py:912/919/972, variables.py:1369 no coords unaffected

Breaking changes for end users

This PR introduces no new user-visible breaking changes. It's a structural refactor that keeps the strict contract at the add_variables / add_constraints call sites via assert_compatible_with_coords. The breaking changes a user sees when this PR lands on master are the union of what #722 and #725 already deliver:

From #725 (mask handling)

Today (master) After this stack Migration
Mask with sparse coords silently fills missing entries with False and emits FutureWarning Raises ValueError mask.reindex({...}, fill_value=False)
Mask with extra dims raises AssertionError Raises ValueError Same fix; only the exception type changed
Mask as pd.Series / pd.DataFrame missing a dim was sometimes silently dropped Broadcast like bounds None needed (strictly more cases work)

From #722 (bounds handling)

Today (master) After this stack
Pandas Series / DataFrame bound missing a dim is silently dropped Broadcast to coords (#709 fix)
DataArray bound dim order depends on bound type (#706) Always follows coords order
DataArray bound with extra dim was silently accepted Raises ValueError
DataArray bound with mismatched coord values raised varied downstream errors Raises ValueError: ...coordinates do not match
Piecewise._broadcast_points had hash-randomized dim order across processes Iterates in declaration order

Refactor-only changes (no user-visible effect on the strict sites)

  • as_dataarray itself no longer raises on extra dims or shared-dim value-set mismatch — every public site that used to depend on those raises now gets them from the explicit assert_compatible_with_coords call. Surface behavior at add_variables / add_constraints is preserved.
  • Two existing tests updated to reflect the new "coords is source of truth, extras broadcast in" semantics of as_dataarray itself:
    • test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned: extra coord entries now expand into the result (the test was pinning an undocumented "dims wins, coord entry dropped" corner of the old lax path).
    • test_dataarray_extra_dims: rewritten so the subset check fires (rather than the value-mismatch check) — the assertion still raises on extra dims, just via the new assert_compatible_with_coords helper.

Test plan

  • test/test_common.py — added five tests pinning the new split (extra-dim preservation, disjoint shared-dim values, assert_compatible_with_coords extra-dim raise, value-mismatch raise, subset-dims allowed).
  • Full suite: 3698 passed, 29 skipped (parity with base).
  • pre-commit run --all-files clean (ruff/format/blackdoc/codespell).
  • Microbenchmark (dev-scripts/benchmark_as_dataarray.py, untracked): flat timings vs the base branch on both add_variables-heavy (≈22 ms / 50–57 ms mean) and arithmetic-heavy (≈80–82 ms) workloads.

🤖 Generated with Claude Code

FBumann and others added 5 commits May 24, 2026 18:41
Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray`
and extracts the contract checks into `assert_compatible_with_coords`, so
linopy now has one broadcasting primitive and one validation companion.

`as_dataarray(arr, coords)` aligns the result against `coords` for every
input type: labels positional inputs (numpy / unnamed pandas / scalar) by
position, reindexes same-values-different-order, expands missing dims,
and transposes to coords order. Extra dims and disagreeing value sets on
shared dims pass through unchanged, so xarray broadcasting in expression
arithmetic keeps working.

`assert_compatible_with_coords(arr, coords)` enforces the strict contract
(`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared
dims). `add_variables` and `add_constraints` now call it after
`as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted
`as_dataarray_in_coords` helper.

`_coords_to_dict` filters MultiIndex level coords out of
`xarray.Coordinates` inputs so the new strict-by-default path treats
`station` (and not its derived `letter` / `num` levels) as the dim.

Test suite: 3698 passed (no regressions). Two existing tests were
updated to reflect the new "coords is source of truth" semantics:
`test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned`
(extra coord entries now broadcast in) and
`test_dataarray_extra_dims` (now triggers the subset check rather than
the value-mismatch check).

Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat
timings vs the base branch on both add_variables-heavy and arithmetic-
heavy workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a silent-failure gap in the strict coords-as-truth path: when the
caller passed ``coords=[[1, 2, 3]], dims=["x"]`` to ``add_variables``,
``_coords_to_dict`` returned an empty mapping (unnamed sequences carry
no dim name), so the strict checks short-circuited and bounds with
extra dims or mismatched values flowed through unchecked, producing
variables with frankenstein outer-joined coord values.

``_coords_to_dict`` now accepts an optional ``dims`` argument that
names unnamed sequence entries by position. ``as_dataarray`` and
``assert_compatible_with_coords`` plumb it through; ``add_variables``
forwards ``kwargs.get("dims")`` to the assertions for ``lower`` and
``upper``. ``coords=[[1, 2, 3]], dims=["x"]`` now enforces the same
contract as ``coords={"x": [1, 2, 3]}`` or
``coords=[pd.Index([1, 2, 3], name="x")]``.

Docstring of ``add_variables.coords`` documents the contract
(subset-of-dims, dim order, value match with auto-reindex, missing-dim
broadcast) and includes four doctests pinning it: the extra-dim raise,
the value-mismatch raise, the same-values-different-order auto-reindex,
and the unnamed-coords-plus-dims opt-in.

Test suite: 3698 passed (parity with the previous commit on this
branch). ``pytest --doctest-modules linopy/model.py -k add_variables``
also green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce align_to_coords to wrap as_dataarray and assert_compatible_with_coords
with user-facing labels (lower bound, upper bound, mask). Errors now name the
argument and distinguish extra dimensions, coordinate mismatches, and conversion
failures. Extend mask validation to use coords+dims= when provided.

Co-authored-by: Cursor <cursoragent@cursor.com>
…coords

Three cleanups on top of align_to_coords:

- Drop the trailing ``.broadcast_like(data.labels)`` in ``add_variables``
  and ``add_constraints`` mask paths. ``as_dataarray`` already expands
  missing dims to ``coords`` shape, so the broadcast was a no-op.
- Stop overriding the caller's ``dims=`` in the ``add_variables`` mask
  path when ``coords is None``. The previous code stripped ``dims`` and
  forced ``dims=data.dims``; with ``data.coords`` being an xarray
  ``Coordinates`` with already-named dims, the user's ``dims`` is
  harmless to forward and the override was just hiding intent. Mask
  now goes through one ``align_to_coords`` call regardless of whether
  ``coords`` is supplied.
- Split the exception handler in ``align_to_coords``: ``TypeError`` from
  unsupported input types is re-raised as ``TypeError`` (still labeled),
  while ``ValueError`` / ``CoordinateValidationError`` stay
  ``ValueError``. Preserves the original type signature for callers
  that want to ``except TypeError``.

New test ``test_align_to_coords_preserves_type_errors`` pins the
TypeError pass-through. Suite: 3703 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant