Skip to content

doc/semantics: Lean 4 mechanization of UnifiedStream error semantics#36614

Draft
antiguru wants to merge 141 commits into
MaterializeInc:mainfrom
antiguru:lean-semantics-skeleton
Draft

doc/semantics: Lean 4 mechanization of UnifiedStream error semantics#36614
antiguru wants to merge 141 commits into
MaterializeInc:mainfrom
antiguru:lean-semantics-skeleton

Conversation

@antiguru
Copy link
Copy Markdown
Member

@antiguru antiguru commented May 18, 2026

Lean 4 mechanization of Materialize's error-handling design at doc/developer/semantics/, plus the design doc that drives it at doc/developer/design/20260517_error_handling_semantics.md.

Current state (post-restart)

The skeleton is on the two-diff baseline described in the design doc's "Stream shape" section. Each stream record is (row : List Datum, diff : Int, err_diff : Int) — both ordinary Int diffs, both retractable. Cell-scoped errors live in Datum::Error inside the row; row-escalation moves multiplicity from diff to err_diff; global-scoped errors (absorbing diff marker) are spec-only.

What's in the skeleton today

  • Scalar evaluation: Datum, Expr, Eval, PrimEval (evalAnd / evalOr / evalIfThen / arithmetic / comparison / coalesce), four-value truth tables in Boolean / Laws.
  • Column-reference analyses: ColRefs (colReferencesBoundedBy, colReferencesUnused, colShift, monotonicity).
  • Substitution: Pushdown (Expr.subst, eval_subst).
  • Strictness: Strict (err- and null-propagating classes for binary primitives).
  • Bag layer: Bag (Row := List Datum, Relation := List Row, filterRel / project for simple bag laws).
  • Stream layer (new): StreamStreamRecord {row, diff, err_diff} + filter / project / negate / unionAll / cross, plus reduction lemmas. Filter follows the design doc's Predicates rule exactly (.bool false / .null / .int zero data, .err migrates data into err side). Cross uses the two-component multiplicative rule.

18 build jobs, all green. No sorry.

What's preserved in branch history (not in current tree)

The branch carried an earlier Diff = Int × ErrCount per-error-payload design through ~33 build jobs before being restarted. That iteration produced:

  • The diff-multiplication soundness counterexample against predicate_pushdown over Cross(L, R) (filed as a PR comment).
  • The "stream equivalence" relation needed to recover soundness for the per-payload form.
  • Fusion / pushdown theorems on the per-payload form (FilterFusion2, ProjectFusion2, Demand2).

The design doc's Alternatives section preserves the per-payload form as a live alternative — the choice between single-Int err-diff and per-EvalError map is not yet settled and turns on whether a user-facing SQL surface needs to distinguish error kinds at the row level. The Lean work is the evidence for what each choice costs.

Major design-doc changes in this PR

  • Cell-scoped errors specification (Datum::Error variant).
  • Row-scoped errors via two-diff record shape; DataflowError carries structured payloads.
  • Global-scoped errors via absorbing diff marker.
  • Operator obligations under the two-diff shape — Predicates, Joins, Aggregates rules.
  • Alternatives include carrier-replacement (rejected), absorbing-only diff (rejected), and per-error-payload diff (open).
  • Open questions: storage cost, source-decoder errors with no row, err-side single-Int vs per-EvalError map.

Build

ci/test/lean-semantics.sh — Docker, Lean v4.29.1, lake.

🤖 Generated with Claude Code

antiguru and others added 30 commits May 18, 2026 23:17
Add design doc proposing per-cell `Datum::Error`, row-scoped errors via
the existing `DataflowError`, and a diff-semiring sketch for
collection-scoped errors. Define the four-valued AND/OR truth tables
({TRUE, FALSE, NULL, ERROR}) that the rest of the spec rests on.

Add a v1 Lean 4 mechanization of the boolean fragment at
doc/developer/semantics/. The skeleton models `Datum`, `Expr`,
`evalAnd`/`evalOr`, and proves all 32 cells of the AND/OR truth tables.
The pattern order in `evalAnd`/`evalOr` matches the current Rust
runtime in src/expr/src/scalar/func/variadic.rs; deviating from that
runtime requires a corresponding diff in Boolean.lean.

CI runs lake build in a self-built Docker image (ubuntu:26.04 + elan +
lean-toolchain pin), gated on changes under doc/developer/semantics/.
No Mathlib dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expand the Lean skeleton with a small set of follow-up theorems that
the optimizer roadmap depends on.

* `Datum.IsErr` propositional predicate plus `DecidablePred` instance,
  for use as a hypothesis in algebraic laws.
* `Expr.not` constructor and `evalNot`, with the 4-cell `NOT` truth
  table and `not_not` (involution on the boolean fragment, no-op on
  `null` and `err`).
* New `Mz/Laws.lean`:
  - `evalAnd_idem`, `evalOr_idem`: idempotence holds unconditionally,
    including on `err` (same error preserved).
  - `evalAnd_comm_of_no_err`, `evalOr_comm_of_no_err`: commutativity
    conditional on neither operand being an error. This is the law an
    optimizer that has run `might_error` analysis can use to justify
    conjunct reordering. Unconditional commutativity fails over
    `(err e₁, err e₂)` with distinct payloads.

Associativity and `might_error` soundness are deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Land the optimizer-facing theorem the previous commit's roadmap called
out as priority one.

* Switch `eval` for `.ifThen` to a strict `evalIfThen` helper. The
  observable `Datum` is identical to the lazy form in this total
  skeleton (both branches are total functions); strictness collapses
  the ifThen proof case to a single `cases` on the condition.
* Redefine `Env.get` by primitive recursion so inductive proofs can
  `cases` on the defining equations directly without going through
  `List.getD`.
* New `Mz/MightError.lean`:
  - Per-operator helper lemmas: `evalAnd_not_err`, `evalOr_not_err`,
    `evalNot_not_err`, `evalIfThen_not_err`.
  - `Expr.might_error` conservative analyzer (literal err taints
    every ancestor; columns assumed err-free).
  - `Env.ErrFree` predicate and `Env.get_not_err` lemma.
  - `might_error_sound`: if `¬e.might_error` and `env.ErrFree` then
    `¬(eval env e).IsErr`. Structural induction on `e`, one helper
    per Expr constructor.
* `Mz/Laws.lean`: lift `evalAnd_comm_of_no_err` and
  `evalOr_comm_of_no_err` through `eval` via `might_error_sound`,
  giving `Expr`-level reorder-safety theorems
  `eval_and_comm_of_no_might_error` and
  `eval_or_comm_of_no_might_error`. These are the laws an optimizer
  would cite when reordering conjuncts in the boolean fragment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* `ErrStrictUnary`, `ErrStrictBinary`, `NullStrictUnary` predicates
  capturing per-position strictness. The cell-scoped analogue of
  PostgreSQL's `STRICT` qualifier on `NULL`, applied to `err`.
* Positive instances: `evalNot` is both err-strict and null-strict;
  the condition slot of `evalIfThen` is err-strict.
* Closure: `ErrStrictUnary.comp` proves strictness is preserved under
  function composition, which is the property an optimizer relies on
  when fusing chains of strict scalar functions into a single MFP
  expression.
* Negative results: `evalAnd` and `evalOr` are not err-strict in
  either argument position. The counterexamples (`FALSE AND ERR`,
  `TRUE OR ERR`) are also canonical regression tests — a future
  change to AND/OR that promoted these cells to `ERR` would break
  exactly these proofs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `evalCoalesce : List Datum → Datum` and prove the laws that
characterize the proposed cell-scoped coalesce.

Semantics: walk operands left to right; return the first concrete
value. When none exists, apply a `null`-beats-`err` tiebreak — if
any operand was `null`, return `null`; otherwise, return the first
`err`. The state machine in `Coalesce.go` tracks a `seenNull` sticky
bit and a first-`err` payload.

The defining error-rescue property
(`coalesce_err_rescue_bool`) and its symmetric null-rescue
(`coalesce_null_rescue_bool`) together establish that a later
concrete operand rescues an earlier `err` exactly as it rescues an
earlier `null`. This is the explicit, user-controllable error trap
referenced in the design doc.

The `null`-beats-`err` tiebreak (`coalesce_null_then_err`,
`coalesce_err_then_null`) preserves the PostgreSQL behavior users
expect when all operands are NULL, and is dual to the strict-function
rule documented in `Mz/Strict.lean`: strict functions promote `err`
above `null` in per-cell results; `coalesce` is non-strict and
demotes `err` below `null` in the tiebreak.

Defer wiring `evalCoalesce` into `Expr` as a `.coalesce` constructor
until the variadic `And`/`Or` ctors land — the termination story is
shared between them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the missing identity laws and the variadic-fold counterparts of
the binary AND/OR evaluators.

* `Mz/Laws.lean`: `TRUE` is the two-sided identity for `evalAnd` and
  `FALSE` is the two-sided identity for `evalOr`. The proofs verify
  each cell of the non-identity argument. These identities are also
  the seed values used by the variadic folds.
* `Mz/Variadic.lean`: `evalAndN`, `evalOrN : List Datum → Datum`
  defined by right-fold. Right-fold gives the cons recurrence by
  `rfl` and avoids the associativity argument a left-fold would
  require.
  - Cons recurrence, nil, singleton, and binary equivalence with
    `evalAnd` / `evalOr`. The binary equivalence is the bridge that
    transports every binary truth-table cell and algebraic law to
    the variadic form on lists of length two.
  - Absorption theorems: a single `FALSE` anywhere in the operand
    list forces `evalAndN` to `FALSE`; symmetric statement for
    `TRUE` and `evalOrN`. These justify the runtime short-circuit
    optimization.

Wiring `evalAndN`, `evalOrN`, and `evalCoalesce` into `Expr` as
list-carrying constructors is deferred — the termination story for
the nested inductive is shared between them and is best landed in a
single commit once it's been validated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `.andN`, `.orN`, and `.coalesce` to `Expr` as variadic
constructors carrying `List Expr`, with the corresponding `eval`
clauses and Expr-level reduction lemmas. The variadic primitives
(`evalAndN`, `evalOrN`, `evalCoalesce`) move to a new
`Mz/PrimEval.lean` to avoid the circular dependency that would
otherwise arise between `Mz/Eval.lean` and the variadic-law files.

File reorganization:

* New `Mz/PrimEval.lean`: every evaluator that operates on `Datum`
  or `List Datum` without referring to `Expr` — the binary boolean
  primitives, the environment, and the variadic primitives.
* `Mz/Eval.lean` shrinks to the `eval : Env → Expr → Datum`
  definition.
* `Mz/Variadic.lean` and `Mz/Coalesce.lean` drop their now-moved
  evaluator definitions and keep only the theorems.
* `Mz/Boolean.lean` and `Mz/Strict.lean` switch their imports to
  `Mz.PrimEval` since they never used the Expr-level evaluator.

`might_error_sound`:

* `induction` cannot be used on the now-nested-inductive `Expr`, so
  the soundness proof is rewritten as a recursive `theorem` that
  pattern-matches the constructor and recurses on subexpressions.
  The signature also makes `env` explicit so the recursive calls
  in compound cases stay readable.
* The list-carrying constructors are handled vacuously: the
  conservative `might_error` for them returns `true` unconditionally,
  so the soundness premise is absurd in those cases. A future
  refinement will tighten the analyzer to inspect operands.

New `Mz/ExprVariadic.lean`:

* `eval_andN`, `eval_orN`, `eval_coalesce` connect the Expr-level
  evaluator to the variadic primitives.
* Empty / singleton cases for all three.
* Binary equivalence: `eval env (.andN [a, b]) = eval env (.and a b)`
  and the dual for `.orN`, transporting `Mz/Variadic.lean`'s binary
  laws through `eval`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness

Refine `Expr.might_error` for the `andN` and `orN` list-carrying
constructors so the analyzer is no longer maximally conservative for
them. The analyzer now recurses into the operand list via
`Expr.argsMightError`, declared mutually with `Expr.might_error` so
Lean's structural-recursion checker accepts both sides without an
explicit termination measure.

`Expr.argsMightError_of_mem` is the membership-driven introduction
form needed by the soundness proof — it turns "some operand might
error" into the bool fold. Its contrapositive is what extracts a
per-operand non-erroring hypothesis from the analyzer's verdict on
the whole list.

`might_error_sound` for the `.andN` and `.orN` constructors now
performs real work:

* Reduce `eval env (.andN args)` to `evalAndN (args.map (eval env))`
  via the existing `simp only [eval]` pattern.
* Apply the list-level `evalAndN_not_err` (also added in this
  commit) with a per-operand non-erroring hypothesis.
* Extract that per-operand hypothesis by destructuring
  `List.mem_map` and applying `Expr.argsMightError_of_mem`
  contrapositively against `hMe`.
* Recurse via `might_error_sound` on the individual operand.

`.coalesce` is still tainted unconditionally — the rescue analysis
is a separate follow-up since the helper `evalCoalesce_not_err`
needs a different list induction (state-machine over `Coalesce.go`).

`Expr.might_error` and `Expr.argsMightError` are now mutually
recursive, which forces well-founded compilation. Existing `simp
only [Expr.might_error]` patterns in the proof are unaffected
because they target equation lemmas, not definitional reduction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refine `Expr.might_error` for the `.coalesce` constructor so the
analyzer captures the rescue rule: `coalesce` might error only when
every operand might error (and the operand list is non-empty). A
single statically-safe operand makes the whole expression safe.

Analyzer changes (mutual block):

* `Expr.argsAllMightError`: companion fold to `argsMightError`,
  empty-list base `true`, cons case `e.might_error && rest...`.
* `.coalesce []` returns `false` (empty coalesce is `.null`, never
  errors).
* `.coalesce (a :: rest)` returns `a.might_error && argsAllMightError
  rest`, which is equal by definition to `argsAllMightError (a :: rest)`.

Soundness chain for the `.coalesce` case:

1. `Expr.exists_safe_of_not_argsAllMightError` extracts a
   statically-safe operand from the negation of the analyzer.
2. `might_error_sound` recurses on that operand and produces a
   not-`IsErr` witness on its evaluated value.
3. `Coalesce.go_not_err` (new) is the state-machine lemma: any
   `Coalesce.go` invocation whose remaining list contains at least
   one non-erroring datum (or whose `seenNull` is already set) does
   not return an error. Proof is structural recursion on the list,
   plumbing the safety witness through the `.null` and `.err` arms.
4. `evalCoalesce_not_err_of_some_safe` packages the above into the
   surface lemma soundness invokes.

`Coalesce.go` is made non-private in `Mz/PrimEval.lean` so the
state-machine lemma can reference it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the `Expr`-level absorption theorems that an optimizer cites when
folding a variadic `AND` containing a `FALSE` operand to `FALSE`
(symmetric for `OR` with `TRUE`).

Two forms each:

* Semantic premise: `(∃ e ∈ args, eval env e = .bool false) →
  eval env (.andN args) = .bool false`. Sufficient for any operand
  the optimizer has reduced to a constant boolean.
* Syntactic premise: `Expr.lit (.bool false) ∈ args → eval env
  (.andN args) = .bool false`. The corollary specialized to the case
  where a literal `.bool false` is syntactically present in the
  operand list. Useful for simple constant folding.

Each proof reduces `eval env (.andN args)` to
`evalAndN (args.map (eval env))` via `eval_andN`, then witnesses
membership of `.bool false` in the mapped list through
`List.mem_map`, then invokes `evalAndN_false_absorbs` from
`Mz/Variadic.lean`. The `Or`/`True` direction is symmetric.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model relations as `List Row` and define the two basic relational
operators on top of the per-row evaluator.

* `filterRel pred rel` keeps rows whose predicate evaluates to
  `.bool true`. Rows evaluating to `.bool false`, `.null`, or
  `.err` are dropped. The skeleton silently drops `err` rows; a
  real implementation would route them to a separate error
  collection. Noted in the docstring as a known modelling gap.
* `project es rel` evaluates each scalar in `es` against every row
  and produces a relation whose row width is `es.length`.

Laws:

* `filterRel_idem`: applying the same predicate twice is the same
  as applying it once. Proof: `List.filter_filter` + `Bool.and_self`.
* `filterRel_comm`: filter commutes with filter. Proof:
  `List.filter_filter` twice + `Bool.and_comm`.
* `project_length`: projection preserves cardinality. Direct
  consequence of `List.length_map`.
* `project_nil`: the empty projection collapses every row to `[]`.

All proofs are one-liners over Lean core's list lemmas; the laws
themselves are the first relational-level rewrites an optimizer
cites. Predicate pushdown across projection, joins, and aggregates
remain follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model the dataflow's data and error collections as a pair carried
together through operators, mirroring the structure of the real
Materialize dataflow where erroring rows are routed to a separate
collection rather than silently dropped.

* `BagStream` is a structure with `data : Relation` and
  `errors : List EvalError`.
* `BagStream.ofRelation` injects a plain relation with no
  accumulated errors.
* `BagStream.filter pred s` evaluates `pred` on every row of
  `s.data`. Survivors (`.bool true`) stay in the data field;
  erroring rows contribute their payloads to the error field via
  `errorRows`; everything else is dropped.

Two helper lemmas carry the structural fact "predicate evaluated to
`.bool true` on every survivor of `filterRel`":

* `rows_in_filterRel_eval_to_true` unfolds the survival condition.
* `errorRows_eq_nil_of_all_true` says a relation where every row
  evaluates to `.bool true` contributes no errors.

`errorRows_filterRel` combines them and is the key fact behind
`BagStream.filter_idem` — the second pass of an idempotent filter
sees only survivors of the first, so it contributes no new errors.
The overall stream is therefore unchanged on the second filter, at
both the data and the error level.

`BagStream.project` and stream-level filter commutativity follow in
later iterations; commutativity at the error level requires
multiset equality on `List EvalError` since list order differs
across permutations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the classical relational rewrite:

  filterRel p (project es rel) = project es (filterRel (p.subst es) rel)

The rewrite lets an optimizer move a `WHERE` clause through a
`SELECT` clause by substituting the projection's source scalars into
the predicate's column references.

* `Expr.subst es e` replaces each `Expr.col i` in `e` with the i-th
  scalar of `es`, with out-of-bounds references defaulting to
  `.lit .null` so the result still evaluates to `.null`. Defined
  mutually with `Expr.substArgs` for the nested-list constructors
  (`andN`, `orN`, `coalesce`).
* `Expr.substArgs_eq_map` bridges the recursive helper to the more
  ergonomic `args.map (·.subst es)` form for use in proofs.
* `Env.get_map_eval` is the column-lookup compatibility lemma:
  reading column `i` from the projected row equals evaluating the
  i-th projection scalar against the original row.
* `eval_subst` is the headline correctness theorem: substituting and
  then evaluating against the original row equals evaluating the
  unsubstituted expression against the projected row. Proof is
  structural pattern recursion mirroring `Expr.subst`.
* `filterRel_pushdown_project` packages `eval_subst` plus
  `List.filter_map` + `List.filter_congr` into the relational form
  an optimizer cites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model the diff-field encoding of collection-scoped (global) errors
proposed in the design doc. The standard differential dataflow diff
type (typically `ℤ` for multiset counts) is extended with an
absorbing `error` element; any sum or product involving `error`
yields `error`, encoding "this collection is invalid at time `t`".

* `DiffWithError α` is a two-constructor inductive: `val (x : α)` for
  ordinary diffs, `error` for the absorbing marker.
* `Add`, `Mul`, `Zero`, `One` instances lift the underlying
  operations and respect the absorbing behavior.
* Absorption laws (`error_add_left`, `error_add_right`,
  `error_mul_left`, `error_mul_right`) are the defining property.
* Commutativity, associativity, and left-distributivity are proved
  parameterically over the underlying `α`: each lemma takes the
  corresponding law on `α` as a hypothesis and discharges the
  `DiffWithError` version by case analysis.

The lemmas are not packaged as `Semiring`/`CommRing` typeclass
instances because the skeleton avoids depending on Mathlib for
build-time reasons. Adding the typeclass wiring is straightforward
once Mathlib is on the dependency list.

Tying `DiffWithError` to an actual `(Row, Time, Diff)` triple stream
and proving the operator-level propagation theorems is the next
step; the present file lays the algebraic groundwork those theorems
rely on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model the spec's preferred encoding where data rows and errors flow
through one collection rather than the `(data, errors)` pair carried
by `BagStream`. The split form in `Mz/ErrStream.lean` mirrors the
current Materialize runtime; the unified form mirrors what the
diff-semiring extension naturally produces (any record with an
`error` diff is a row-scoped error, and the same encoding extends
to collection-scoped errors).

* `UnifiedRow` is a sum type — `row (r : Row)` or `err (e :
  EvalError)`. The bare `err` variant preserves the current
  property that errors carry no row context.
* `UnifiedStream := List UnifiedRow`.
* `UnifiedStream.ofBag` packs a `BagStream` (data rows first,
  errors second). `UnifiedStream.split` is its inverse, splitting
  the carrier back via `filterMap pickRow` / `filterMap pickErr`.
* `UnifiedStream.filter` runs the predicate per record, routing
  erroring rows to `.err` records in place. Existing `err` records
  pass through unchanged — the carrier handles error propagation
  without per-operator boilerplate.

Round-trip theorem `UnifiedStream.split_ofBag : split (ofBag s) =
s` proved at both the data and error field level, using four
private filterMap-over-tagged-list helpers and structural
induction. The cross-direction `filter ∘ ofBag ≈ ofBag ∘ filter` is
left for future work — equivalence is only up to multiset equality
on errors because `ofBag` fixes a concat order between data and
errors, while filters that interleave the two would produce a
different list order.

`@[ext]` added to `BagStream` so the round-trip proof can use
`apply BagStream.ext`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`List.filterMap` was passed to four `simp` invocations in the
filterMap-of-mapped-list helpers but never actually used in the
rewriting — the cons recurrence and the `pickRow` / `pickErr` arms
combined with the IH are enough. Drop the unused arg; the
linter-clean.
Model the spec rule that SQL aggregates such as `SUM`, `MIN`, `MAX`,
and `AVG` propagate `err` like strict scalar functions. NULLs are
skipped (matching PostgreSQL's "ignore NULLs" reading); errors are
propagated, with the first one in scan order winning.

* `aggCountNonNull`: `COUNT(expr)` — counts rows whose value is
  neither `NULL` nor `err`. Matches PostgreSQL.
* `aggStrict f`: variadic strict reduction parametric over the
  combiner `f`. Empty list (or list of only `NULL`s) returns `NULL`;
  any `err` returns the first `err`; non-`NULL`/`err` values are
  combined via `f`.

Two propagation theorems:

* `aggStrict_err`: any `err` input forces an `err` output. Proof by
  structural recursion on the list; the `.bool` case branches on
  `aggStrict f rest` and shows the `err` arm of the inner match is
  the one that fires.
* `aggStrict_no_err`: dual statement under the additional
  hypothesis that the combiner preserves "no-err". Captures the
  precondition an aggregate operator can use to guarantee its
  output is not an error when its column is error-free.

`try_*` aggregates (the non-strict variants that swallow `err` into
`NULL`) are future work; they satisfy a coalesce-style law rather
than strict propagation and warrant their own file once the spec
side is clearer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model the proposed `try_sum`, `try_min`, etc. that swallow `err`
into `NULL` instead of propagating. Defined as a post-pass coalesce
on `aggStrict`:

  aggTry f xs = match aggStrict f xs with
                | err _ => null
                | r     => r

Equivalent reading via the existing `evalCoalesce`:

  aggTry f xs = evalCoalesce [aggStrict f xs, .null]

The defining property `aggTry_no_err` says the result is never an
`err`. The companion `aggTry_eq_aggStrict_of_no_err` says the
non-strict variant agrees with the strict one whenever the strict
one would not have erred — so an optimizer that has already proved
the inputs error-free can swap `aggTry` for `aggStrict` (and vice
versa). Both lemmas reduce to case analysis on `aggStrict f xs`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Model the diff-only slice of differential dataflow's `compact` /
consolidation operation: summing a list of `DiffWithError α` values
and showing that an `error` diff absorbs the consolidated sum.

* `DiffWithError.sumAll` sums a list of diffs starting from `0`,
  right-associative. The result lives in `DiffWithError α` for any
  base `α` with `Zero` and `Add` instances.
* `sumAll_eq_error_of_mem`: the headline absorption theorem. If
  `error` appears anywhere in the list, the consolidated sum is
  `error`. Proof walks the list and uses the absorption laws from
  `Mz/DiffSemiring.lean` at the matching cons.
* `sumAll_val_of_all_val`: companion no-error-preservation theorem.
  An all-`val` list sums to `val` of some `α`.

The full `compact` operator also buckets records by `(row, time)`;
that bucketing is orthogonal to the absorption argument and would
require `DecidableEq Row` plus a time model, both follow-ups. The
per-bucket inner sum modeled here is what an operator-level proof
would invoke once those pieces are in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `TimedRecord` and `TimedStream` to model differential
dataflow's record format with errors. `Time := Nat`; the diff is
`DiffWithError α` parametric in the base diff type.

Operations:

* `TimedStream.consolidateAll` sums every diff in the stream,
  ignoring row and time. The collection-wide diff.
* `TimedStream.consolidateAt t` sums every diff at time `t`,
  ignoring row. The per-time collection diff.

Both reduce to `DiffWithError.sumAll`, so the absorption laws from
`Mz/Consolidate.lean` transport:

* `consolidateAll_eq_error_of_mem`: any `error`-carrying record
  forces the all-stream consolidation to `error`.
* `consolidateAt_eq_error_of_mem`: any `error`-carrying record at
  time `t` forces the per-time consolidation at `t` to `error`.

Per-`(row, time)` bucketing — the form `compact` actually uses in
the runtime — is the next refinement and requires `DecidableEq` on
`Row`. The current sums are the collection-global diffs at each
time slice, which is what operator-level proofs invoke once the
bucketing is layered on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Stream

Two-input relational join modeled on the unified single-collection
stream. Cross product is the building block; predicate join is
cross followed by `UnifiedStream.filter`.

* `UnifiedStream.cross l r` produces one output record per
  `(lu, ru)` pair: both rows ⇒ concatenated row; either side `err`
  ⇒ that side's `err` (left wins on conflict, matching the binary
  AND's first-error rule).
* `UnifiedStream.join pred l r := (cross l r).filter pred`.

Error propagation falls out of the carrier: an `err` record on
either side contributes one `err` to the output per record on the
other side, matching the diff-semiring's `error * diff = error`
intuition.

Theorems: `cross_nil_left` and `cross_nil_right` cover the empty
cases (joining anything with the empty stream is empty). The
cardinality theorem `(cross l r).length = l.length * r.length` is
deferred — needs `List.length_flatMap` + Nat arithmetic that the
current skeleton's lemma toolkit handles awkwardly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Partition a relation by an evaluated key expression and aggregate
per group. Closes the SQL `SELECT k, AGG(v) FROM r GROUP BY k` shape
the rest of the algebra implies.

* `Datum` derives `DecidableEq` (via the already-decidable
  `EvalError` payload) so keys can be compared at runtime.
* `groupBy keyExpr rel` walks rows, evaluates the key on each, and
  inserts into the existing group with that key or starts a new
  one. Output is `List (Datum × Relation)`.
* `aggregateBy keyExpr valExpr f rel` applies `aggStrict` to each
  group's evaluated value column, producing `List (Datum × Datum)`
  — one aggregated value per group.

Spec divergence on `err` keys documented inline: the skeleton uses
standard `DecidableEq Datum`, so two `Datum.err e` values with the
same payload collapse into one group. The spec requires every err
key to be its own group. The natural refinement is a custom
`Datum.groupKeyEq` that returns `false` whenever either side is an
err; left for future work.

Theorems for now cover the trivial cases (empty, singleton).
Cardinality (`sum of group sizes = rel.length`) is the next
concrete law and is deferred until the lemma toolkit picks up the
needed Nat / List arithmetic helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Datum.groupKeyEq` — group-key equivalence that returns `false`
whenever either side is `.err`, so two error keys never coalesce
even if their payloads happen to match. Build `groupByErrDistinct`
and `aggregateByErrDistinct` on top, leaving the existing
`groupBy` / `aggregateBy` (which use derived `DecidableEq Datum`)
in place as the more permissive variant.

Theorems:
* `Datum.groupKeyEq_err_left`: simp lemma reducing the err-left
  case to `false`.
* `insertIntoDistinct_err`: inserting an err-keyed row into any
  group list appends a fresh singleton group at the end.
* `groupByErrDistinct_length_of_all_err`: when every row's key
  evaluates to err, the output has one group per row — closes the
  spec-divergence note from the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`BagStream.project es s` projects each row in `s.data` through the
list of scalar expressions `es`. A row stays in the data collection
only when every scalar succeeds on it; otherwise the row's err
payloads (one per erroring scalar) are appended to the error
collection.

Helpers:
* `rowAllSafe`: every projected scalar succeeds on a row.
* `rowErrs`: collect every err payload from one row's projections.
* `projectErrs`: aggregate `rowErrs` across the relation.

Laws:
* `BagStream.project_data` / `_errors` — per-field reductions.
* `BagStream.project_nil_es` — empty projection list keeps every
  row and produces width-zero output rows.
* `BagStream.project_empty_data` — empty data collection projects
  to empty data, preserving input errors.
* `rowErrs_nil_of_all_safe` and `projectErrs_eq_nil_of_all_safe` —
  when every scalar succeeds on every row, the error collection is
  unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UnifiedStream.cross_length` proves the headline relational identity
`(cross l r).length = l.length * r.length`. Cross product on the
unified carrier produces exactly one output record per `(l, r)`
pair regardless of which side carries an error — every err
contributes one err record per element of the other side, matching
the diff-semiring's `error * d = error`.

Companion `UnifiedStream.filter_length_le` proves the predicate
filter on the unified stream is non-expanding (each input produces
zero or one output). Composing the two gives the corollary
`UnifiedStream.join_length_le`: a join's output length is bounded
above by `l.length * r.length`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`totalRows g = sum of g.rows.length` is the metric. Theorems:

* `totalRows_insertInto`: a single insert adds exactly one row,
  whether the key matches an existing group or creates a new one.
* `totalRows_insertIntoDistinct`: same bookkeeping for the
  err-distinct variant.
* `totalRows_groupBy`: sum of group sizes equals the input
  relation's length — partitioning loses and duplicates nothing.
* `totalRows_groupByErrDistinct`: the err-distinct variant
  preserves the same invariant.

Use an explicit recursive `totalRows` definition instead of
`((map ·.2.length).sum`. The latter trips `omega` because the
elaborator and the simp set produce two syntactically distinct
forms (`(·.2.length)` vs `List.length ·.snd`) that omega treats
as independent atoms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote `UnifiedStream` from `List UnifiedRow` to
`List (UnifiedRow × DiffWithError Int)`. Every record now carries
a differential-dataflow diff augmented with the absorbing `.error`
marker. Row-scoped errors still propagate through the `UnifiedRow`
carrier; collection-scoped errors propagate through diff
multiplication / addition.

Changes:

* `UnifiedStream.ofBag` tags every bag record with diff `.val 1`.
* `UnifiedStream.split` discards the diff component (round trip
  still goes through because `ofBag` only emits `.val 1` diffs;
  the cross-direction loses information for diffs ≠ `.val 1`).
* `UnifiedStream.filter` preserves the diff of survivors; rerouted
  rows (predicate errs) keep their multiplicity.
* `UnifiedStream.cross` combines carriers via the new
  `combineCarrier` helper and multiplies diffs through
  `DiffWithError`'s `Mul` instance. A `.error` diff on either
  side absorbs the product diff via
  `DiffWithError.error_mul_{left,right}`.

New theorem `cross_diff_error_left` witnesses the absorption:
crossing a left-side `.error` diff with any record on the right
produces a `.error` diff in the output. Existing theorems
(`cross_length`, `filter_length_le`, `join_length_le`,
`split_ofBag`) carry through to the new representation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten diff propagation in `UnifiedStream.filter` and add the
symmetric `cross` absorption theorem.

`filter` now special-cases records carrying a `.error` diff,
passing them through unconditionally regardless of predicate
outcome. The previous version dropped such records when the
predicate evaluated to `.bool false` or `.null`, which violates
the semiring law that `.error` must absorb every downstream
operator. `filter_length_le` still holds — each input still
produces at most one output.

Three new theorems witness the absorption:

* `cross_diff_error_right`: symmetric counterpart to the existing
  `cross_diff_error_left` — a `.error` diff on the right side
  forces the output diff to `.error` for every record on the left.
* `filter_preserves_error_diff`: a record `(uc, .error)` in the
  input survives the filter as `(uc, .error)` in the output, no
  matter what the predicate is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UnifiedStream.consolidate` buckets unified-stream records by
carrier (data row or row-scoped err) and sums per-bucket diffs.
Headline theorem `consolidate_preserves_error` lifts the
diff-semiring absorption rule from `Mz/Consolidate.lean` to the
unified stream: an `.error` diff anywhere in the input forces a
`.error` diff in the consolidated output for that carrier.

Required:

* `deriving DecidableEq` on `UnifiedRow` (Row already has it via
  `Datum`, EvalError too).
* Two helper lemmas — `consolidateInto_error_diff` (inserting an
  `.error` diff yields `.error` for the bucket) and
  `consolidateInto_preserves_error_mem` (a pre-existing
  `.error` record survives every subsequent insert).

Adding times reduces to running `consolidate` inside each time
slice; that lifting is mechanical and left for the per-`(row, time)`
follow-up alongside `Mz/Triple.lean`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two companion theorems for `UnifiedStream.consolidate`.

`consolidate_length_le`: bucketing only merges, never expands.
The output has at most as many records as the input. Helper
`consolidateInto_length_le_succ` bounds a single insert at
`us.length + 1`.

`consolidate_no_error`: if every input diff is `.val n`, every
output diff is `.val m`. The diff-semiring's `.val + .val = .val
(· + ·)` keeps the consolidated buckets in the ordinary `Int`
slice, so `.error` is the only source of absorption.

Together with the existing `consolidate_preserves_error`, the
three theorems pin down the consolidation operator: absorption
on errors, non-expansion on cardinality, and stability on
error-free input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
antiguru and others added 5 commits May 18, 2026 23:17
`UnifiedStream.filter_comm`: filter p (filter q us) =
filter q (filter p us) when neither predicate errors on the
input's data rows. Reduces to `filter_filter_fuse` applied both
ways, then equates `.and q p` and `.and p q` via
`evalAnd_comm_of_no_err`.

New helper `filter_eval_eq`: two filters with eval-equivalent
predicates on every data row produce equal outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cite filter_idem (unconditional) and filter_comm (under
err-freedom) alongside filter_filter_fuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UnifiedStream.filter_project_pushdown`: filter p ∘ project es =
project es ∘ filter (p.subst es), under projsAllSafe es us.
Lifts BagStream.project_filter_pushdown_data (data-side only)
to the unified stream — errs flow through both pipelines
symmetrically under the safety hypothesis, so no data/error
asymmetry remains.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Note filter_project_pushdown in the Modelable row for
projection-lifting passes. Still missing: project_cross_pushdown
(harder — depends on column-split semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`clampToOne_id_of_one`: clampToOne is identity when every
record's diff is already .val 1 or .error. Companion to
clampPositive_id_of_positive — same threshold-elision pattern,
but for the set-semantics post-pass instead of the bag-semantics
one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antiguru
Copy link
Copy Markdown
Member Author

Foundational gaps

Ranked by what's currently blocking further coverage.

Tier 1: blocks most remaining modelable passes

  • Stream equivalence relation. UnifiedStream is List, so equality is list-equality. Many natural laws only hold up to multiset / bag equivalence:

    • unionAll_comm — fails as list eq (concat order).
    • consolidate_idemconsolidate reverses order on noDup inputs (see SetOps.lean algorithm), so consolidate ∘ consolidate ≠ consolidate.
    • cross_comm — Cartesian product is symmetric only as multiset.
    • unionAll_distrib_cross_right — interleaves vs groups.

    Fix: define UnifiedStream.Equiv ≈ List.Perm (or stronger: zero-diff and same-carrier-sum quotient), or switch to Multiset (UnifiedRow × DiffWithError Int).

  • Reduce operator (group-by aggregate). Mz/Aggregate.lean exists on List Datum only. No UnifiedStream lift. Single largest infra gap — blocks fusion/reduce, reduce_elision, reduce_reduction, reduction_pushdown. Needs group-key expressions, aggregate functions (sum / count / min / max / ...), and per-group error handling.

  • Map separate from project. UnifiedStream.project currently replaces the row with es.map (eval r). Rust's Map appends columns. Without separate Map, MapFilterProject and canonicalize_mfp can't be modeled cleanly.

Tier 2: specific pass clusters

  • Schema / column types. Rows are List Datum with no type info. Blocks:

    • non_null_requirements — needs per-column nullability lattice.
    • column_knowledge — needs per-column {literal, nullable, type} lattice.
    • Width-correct join statements (width is implicit and unchecked).
    • SQL type-rejection (currently evalAnd .int .int returns .int n for totality).
  • Other missing operators:

    • TopK (sort + limit, needs ordering on Row).
    • FlatMap (table-valued function).
    • Constant (literal stream — would unlock fold_constants).
    • Get / Let / LetRec (named bindings, fixpoint for recursive queries).
    • ArrangeBy (key-indexed view — physical but cited by passes).
  • Equivalence-class lattice for equivalence_propagation. Currently only use-sites modelable.

  • Monotonicity / NoRetraction predicate for monotonic.rs. Streams that never retract.

Tier 3: correctness reach

  • Bridge to actual Materialize code. No translation function `MirRelationExpr → UnifiedStream` composition, no soundness statement "the Rust optimizer's rewrites are valid under this semantics." All current theorems are about an idealized model.

  • Real time / frontier semantics. TimedUnifiedStream has Nat times. Missing:

    • Partial order on times.
    • Frontier (antichain) type.
    • since / upper bound machinery.
    • Capability / retraction-as-of-time laws.
  • Fixpoint semantics. No model of LetRec / iterate. WITH RECURSIVE is invisible to the spec.

  • Differential dataflow proper. cross is one-shot Cartesian product, not arrangement-based. No worker partitioning. Mostly physical, but the logical retraction semantics deserve modeling.

Recommendation order

Tier 1 stream-equivalence + Reduce + separate Map unlocks the most. Schema enables a different class of analyses cleanly. Bridge to MirRelationExpr is the biggest reach but the only way the spec becomes load-bearing for real review.

antiguru and others added 5 commits May 20, 2026 11:45
Adds an Int × ErrCount component to the diff so row-scoped errs
preserve row content AND retract through normal diff arithmetic.
Replaces the previous design where row-scoped errs went through
DataflowError-with-row-drop (lossy) or absorbing-marker (terminal).

Two-layer encoding:
* val (Int × ErrCount): retractable, used for row-scoped errs.
* global: absorbing, used for collection-scoped errs only.

Covers WHERE / join-predicate err semantics, multiplication rule
under cross/join, retraction symmetry, and the rejected
alternatives (carrier replacement, absorbing-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for the design-doc refactor toward non-absorbing
row-scoped errs.

ErrCount := EvalError → Int (pointwise add/neg, scalar mul, single).
Diff := { val : Int, errs : ErrCount } with full commutative-ring-like
algebra:
* additive: assoc / comm / zero / neg.
* multiplicative: (a, m) * (b, n) = (a*b, a • n + b • m).
* distributive: mul_add / add_mul.
* neg_mul / mul_neg.

Zero (0, 0) absorbs *. One (1, 0) is identity for *.

New file Mz/DiffErrCount.lean. Adds 33rd build job. The earlier
DiffWithError (absorbing) stays in place; the next commit
repurposes it as DiffWithGlobal (collection-scoped only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`DiffWithGlobal := val (Diff) | global` — absorbing only on
`global`. Used for collection-scoped errs that cannot retract.
The val side wraps the row-scoped `Diff` (Int × ErrCount) from
DiffErrCount.lean, which is fully retractable.

Mirrors DiffSemiring.lean's absorbing API at the `global` element:
absorption laws on +/* (both sides), commutativity, associativity,
distributivity, negation, the converse `add_eq_global_left_or_right`.

The existing absorbing `DiffWithError α` stays in place; downstream
files migrate to DiffWithGlobal in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parallel module to UnifiedStream.lean. Carrier := Row (no err
constructor); row-scoped errs live in the diff's ErrCount
component, collection-scoped errs in DiffWithGlobal.global.

* filter: predicate err converts row's valid-count copies into
  err-count copies via ErrCount.single. Row content preserved.
* project: rowAllSafe → row replaced by es.map (eval r), diff
  unchanged. Unsafe → row preserved, valid count zeroed, per-
  erroring-scalar err counts via rowErrCount helper.
* global diff passes through both operators unchanged.

Includes filter_nil, filter_append, project_nil_stream,
project_append reduction lemmas.

Old UnifiedStream module retained; downstream operators migrate
incrementally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three parallel operator modules over UnifiedStream2 (new diff-
based err encoding):

* Join2: cross, join. Carrier-row concat replaces combineCarrier.
  Diff via DiffWithGlobal.mul. Includes cross_nil_left/right,
  cross_append_left, cross_cons_left, cross_length.

* SetOps2: unionAll, negate, exceptAll. Subset of old SetOps;
  clampPositive/clampToOne/distinct/intersectAll defer to later.
  Includes unionAll_assoc, negate_negate (via
  DiffWithGlobal.neg_neg), negate_append.

* UnifiedConsolidate2: bucket-by-row consolidation. Row carrier
  is plain Row (no err marker), so consolidateInto keys on Row's
  DecidableEq. Includes consolidateInto_match/skip and
  consolidate_singleton.

38 lean build jobs all green. Old modules untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mgree mgree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from our conversation.

Comment thread doc/developer/semantics/Mz/Datum.lean Outdated
skeleton's variants are intentionally small; production
`EvalError` (in `src/expr/src/scalar.rs`) has many more. -/
inductive EvalError
| placeholder
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this?

Comment thread doc/developer/semantics/Mz/Expr.lean Outdated

* `lit d`: literal datum.
* `col i`: reference to column `i` in the surrounding environment.
* `and a b`, `or a b`: binary logical conjunction and disjunction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why we'd have this AND andN.

Comment thread doc/developer/semantics/Mz/Expr.lean Outdated

A minimal subset of `MirScalarExpr` (see `src/expr/src/scalar.rs`).

The skeleton uses binary `and` and `or` rather than the variadic form
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This graf is confusing. We have both in the AST, for some reason, but they're both defined in terms of primitive, binary, false-before-error-before-null semantics for AND.

| (.row r, d) =>
match eval r pred with
| .bool true => [(.row r, d)]
| .err e => [(.err e, d)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (.row r, .error).

def UnifiedStream.filter (pred : Expr) (us : UnifiedStream) : UnifiedStream :=
us.flatMap fun ud => match ud with
| (_, .error) => [ud]
| (.err e, d) => [(.err e, d)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to have a notion of row-level error as opposed to a row with Datum::error in it.


namespace Mz

inductive UnifiedRow where
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not right: if we're using Datum::Error, then I don't think we need a row-level error. (As we talked about, this is the correct model for mz as she exists, but not for the version with Datum::Error.)

antiguru and others added 4 commits May 20, 2026 16:08
* TimedConsolidate2: TimedUnifiedStream2 := List (Row × Nat ×
  DiffWithGlobal). advanceFrontier (max-based), atTime (filterMap
  drop), consolidateAtTime composing UnifiedStream2.consolidate.

* Consolidate2: DiffWithGlobal.sumAll right-fold from 0. Plus
  sumAll_eq_global_of_mem (absorption), sumAll_val_of_all_val
  (no-escalation), sumAll_global_inv (reverse direction).

40 lean build jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TimedUnifiedStream2.consolidateAll + consolidateAtTimeFlat.
Cite DiffWithGlobal.sumAll_eq_global_of_mem and sumAll_global_inv
for absorption + inversion. Iff forms wrap both directions.

41 lean build jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FilterFusion2 (331 lines):
* predNoRowErr — quantifies over val records only (.global
  unconstrained).
* filter_filter_fuse, filter_idem (unconditional!), filter_comm,
  filter_eval_eq.
* filter_idem now needs no hypothesis: filter operates only on
  the diff, so a second pass sees the same row and same eval
  result; ErrCount.single _ 0 absorbs cleanly into prior errs.

Demand2 (293 lines):
* replaceAtRow always touches the row (no err-carrier carve-out).
* filter_replaceAtRow_of_unused — full commute under
  colReferencesUnused.
* project_replaceAtRow_of_unused — full commute under
  argsColRefUnusedList2 + es.length ≤ n. Width side condition
  handles project's row-rewrite asymmetry: when safe, output row
  is es-mapped and replaceAtRow on output touches a different
  position than on input. Putting n out-of-bounds in the projected
  row makes Env.replaceAt the identity there.

argsColRefUnusedList2 renamed to avoid clash with Demand.lean's
identically-named predicate.

43 lean build jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* projsAllSafe predicate + tail/head helpers.
* project_project_fuse — strengthened from old: needs both
  `projsAllSafe es us` AND `projsAllSafe (es'.map (·.subst es)) us`.
  The second hypothesis is needed because new project preserves
  the row on unsafe — without it, fused unsafe-branch carrier (r)
  disagrees with step1-safe-then-step2-unsafe carrier
  (es.map (eval r)). Equivalent to es' safe on every projected row.
* filter_project_pushdown — matches old signature exactly.

JoinPushdown2 omitted — the original signature is provably false
under the new diff algebra: cross's mul rule `(a*b, a•n + b•m)`
loses the right-side err counts when filter pushes down (filter
zeroes left val before cross). A correct version needs a
`NoRowErrs r` hypothesis or a quotient-by-consolidation
equivalence. Deferred.

44 lean jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@antiguru
Copy link
Copy Markdown
Member Author

Soundness finding: predicate pushdown over Cross(L, R) is unsound under the new diff semantics

The Lean refactor toward the design doc's row-scoped retractable err encoding (Diff = Int × ErrCount) surfaced a real semantic gap. The optimizer's predicate_pushdown rewrite filter pred (cross l r) ↝ cross (filter pred l) r (with pred referencing left columns only) does not preserve the output collection's err multiplicities.

Concrete counterexample

l = [(la, val ⟨1, 0⟩)]                 -- 1 valid copy of la, no err counts
r = [(rb, val ⟨0, single e 1⟩)]        -- 0 valid copies of rb + 1 err copy under err e
pred = lit (bool false)                -- trivially false on every row

Cross mul rule: (a, m) * (b, n) = (a*b, a • n + b • m). So
(1, 0) * (0, single e 1) = (0, single e 1) — the valid la joined with the err copy of rb produces one err copy of la++rb under err e. The err propagates through cross as you'd expect.

  • LHS filter pred (cross l r) = [(la++rb, val ⟨0, single e 1⟩)]. Filter on a bool false eval zeroes val (already 0) and keeps the err count.

  • RHS cross (filter pred l) r = [(la++rb, val ⟨0, 0⟩)]. filter pred l zeroes left val. Then mul (0, 0) * (0, single e 1) = (0, 0 • single e 1 + 0 • 0) = (0, 0). The err count is dropped.

Mismatch: LHS preserves the err under e; RHS loses it.

Root cause

Cross's mul rule scales right-side err counts by left-side val count. Filter pushdown zeroes left val before crossing — so right-side err copies that were going to be carried by left's valid copies disappear.

Why this is invisible today

In the current production model row-scoped errs route to DataflowError. The err collection of r flows to the join output regardless of the filter — it's a separate stream, not a multiplicity inside the diff. The pushdown rewrite operates only on the data collection, which agrees with the rewrite by construction.

Once row-scoped errs become first-class diff multiplicities (the design's whole point — preserve row content + retract through normal arithmetic), the data path and err path are unified, and the multiplicative rule must account for both. The rewrite as written then changes the output.

Options

  1. Guard the optimizer. predicate_pushdown should not fire over Cross(L, R) when R may carry err diffs (or when its errs component is non-empty). Static analysis can be conservative: assume any operator output that didn't pass through escalateRowErrs may carry err counts.

  2. Add a compensating term. Pushdown emits cross filter_dropped_l r_errs as a side stream and unions into the output. Algebraically:

    filter pred (cross l r) = cross (filter pred l) r ⊎ (compensation)

    where the compensation captures what falls through. Cleaner in theory, harder in practice.

  3. Stream equivalence quotient. Pushdown holds up to consolidation — same logical collection, different list representation. Requires defining Stream.Equiv (multiset / bag equivalence) as a first-class relation. This is the missing Tier 1 foundation mentioned in the earlier gaps comment.

  4. Restrict the diff semantics. If "row-scoped err multiplicities should never survive past a filter on adjacent dimensions," cross mul rule could drop right errs when left val is zero. But that breaks distributivity and the standard diff-semiring laws — likely worse than the bug.

Recommendation

Option 1 (guard) is the safest near-term move. Option 3 (quotient) is the right long-term answer but requires the foundational work listed in the prior comment.

The design doc should specify which view it picks. Right now §"Joins" says a join predicate's err is symmetric with WHERE's err, but it doesn't address how pushdown rewrites interact with diff-scoped err counts. The model needs that resolution before the optimizer can target the new encoding.

Filed from the Lean attempt: the proof of filter_cross_pushdown_left (which previously succeeded under the old absorbing-only IsPureData hypothesis) does not go through here. Old hypothesis ruled out the counterexample by construction; the new diff makes it expressible.

antiguru and others added 13 commits May 22, 2026 06:52
Adds a "Stream equivalence" subsection between Global-scoped errors
and SQL error semantics. Defines the denotational equality on
UnifiedStream as:

* data-multiplicity equality (per-row sum of val counts), plus
* global-marker equality.

Per-row err counts (Diff.errs) drop out of the equivalence relation.

This resolves the predicate_pushdown over Cross(L, R) soundness
gap surfaced by the Lean refactor. Under strict list equality the
rewrite is unsound — cross's mul rule loses right-side err counts
when filter zeroes left val first. Under the new equivalence the
rewrite is sound because data multiplicities are unchanged and SQL
observes the data side.

Open question added: user-facing surface for err counts, if any.
The choice constrains optimizer rewrites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines the denotational equality on UnifiedStream2:

* dataMult s r — per-row sum of valid counts.
* hasGlobal s — whether any record carries .global.
* Equiv s t — dataMult-agreement across all rows + hasGlobal-agreement.

Per-row err counts (Diff.errs) drop out of Equiv. Two streams
agreeing on data multiplicities and global status are equivalent
regardless of how ErrCount is distributed.

Foundation lemmas shipped:
* Equiv.refl / symm / trans (+ Trans instance).
* dataMult_append, hasGlobal_append distribute over ++.
* Equiv.append — congruence under concatenation.
* Equiv.errsChange_singleton — explicit witness that ErrCount
  differences don't break Equiv.
* Equiv.cons_zero — a (r, val ⟨0, m⟩) record can be absorbed
  into the tail.

Future work (deferred):
* Equiv-congruence for filter/project/cross/negate/unionAll.
* filter_cross_pushdown_left modulo Equiv (the rewrite that
  fails under strict equality).
* Equiv ↔ multiset bijection.

45 lean jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review (mgree). The placeholder variant was a stub left over
from initial scaffolding; divisionByZero is the real variant the
skeleton needs. Strict.lean's four err-strict counterexample
proofs switch to .divisionByZero — any concrete EvalError works
to instantiate "some err value."

Did NOT do the and/orN rename in the same commit: it's an
11-file, 33-occurrence cascade through ColRefs, Pushdown,
MightError, FilterFusion, FilterFusion2 (every match arm on .and/.or
constructors). Will report scope separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review (mgree). Original docstring said the skeleton uses binary
and/or "rather than" variadic, which was misleading — the skeleton
carries both. New docstring explains why:

* Binary and/or is the natural object for the four-value truth
  tables and the filter-fusion theorems.
* Variadic andN/orN matches the runtime MirScalarExpr shape so
  optimizer rewrites that target the runtime AST can be modeled
  faithfully.

The two share semantics via evalAndN = right-fold of evalAnd, so
every binary-form law lifts. The dual representation is
intentional, with bounded duplication only on eval, ColRefs, and
Pushdown.

A future cleanup may collapse the constructors and treat binary as
sugar; the doc notes the cost (cascade through ~33 match sites).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Major revision after review feedback (mgree) + design discussion.

New baseline (replacing diff-encoded multiplicities):

* Stream record shape: (row : List Datum, diff : Int, err_diff : Int).
  Both ordinary Int. Both retract. Single-stream form equivalent to
  two-collection (data, errors).

* Cell-scoped errs: Datum::Error inside the row (unchanged design).
* Row-escalation: operator moves multiplicity from diff to err_diff.
  DataflowError carries structured payload.
* Global-scoped: DiffWithGlobal = val(Int) | global, absorbing only.

Predicates rule rewritten in operator-explicit form:
* TRUE → (r, diff, err_diff)
* FALSE/NULL → (r, 0, err_diff)
* ERROR e → (r, 0, err_diff + diff)

Joins rule rewritten with two-component multiplication:
(rL, dL, eL) × (rR, dR, eR) = (rL ++ rR, dL·dR, dL·eR + eL·dR + eL·eR).

Alternatives section:
* Per-error-payload diff component (Diff = Int × ErrCount): kept as
  open alternative, NOT rejected. Records what the per-payload form
  buys (which-error visibility at diff level, multi-error
  distinguishability) vs what the baseline buys (Int arithmetic,
  predicate_pushdown soundness under =, no equivalence quotient).
  The Lean branch's worked-out ErrCount form preserved as evidence.

Open question added: should err-side be single Int or per-EvalError
map? Answer depends on a user-facing SQL surface decision.

Other deletions:
* Stream equivalence section removed (no longer needed under the
  two-diff baseline; strict list equality works).
* "Row-scoped errors: DataflowError plus diff-encoded multiplicities"
  reverted to a cleaner statement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the revised design doc, the stream record shape is now
(row : List Datum, diff : Int, err_diff : Int). The previous
work (Diff = Int × ErrCount, DiffWithGlobal, UnifiedStream2,
Equiv, fusion/pushdown theorems on the per-error-payload form) is
preserved in git history on this branch as a worked alternative.

Files deleted (28):
* All diff-encoding modules: DiffErrCount, DiffSemiring,
  DiffWithGlobal, Equiv.
* All stream-shape modules: UnifiedStream(2), TimedConsolidate(2),
  Consolidate(2), UnifiedConsolidate(2), Triple(2).
* All operator modules: Join(2), SetOps(2), ErrStream, Aggregate,
  GroupBy.
* All fusion / pushdown / demand modules: JoinPushdown,
  FilterFusion(2), ProjectFusion(2), Demand(2).

Files retained (scalar-level + bag):
* Datum, Expr, Eval, Boolean, PrimEval, Laws, Strict (scalar
  evaluation + truth tables).
* ColRefs, Pushdown (column-reference analyses + Expr.subst).
* Coalesce, MightError, Variadic, ExprVariadic.
* Bag (Row := Env, plus simple bag-level operators).

New file:
* Mz/Stream.lean — fresh skeleton on the two-diff baseline.
  StreamRecord {row, diff, err_diff}, filter / project / negate /
  unionAll / cross with reduction lemmas. Per-record filter
  follows the design doc's Predicates rule exactly: data drops on
  false/null/int, migrates to err side on .err (with diff added
  to err_diff). Cross uses the two-component multiplicative rule.

18 lean build jobs green (was 45 under the previous design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review (mgree). After the restart, the cascade is small enough
to absorb. Expr loses the .and / .or constructors; they become
sugar:

  Expr.and a b := Expr.andN [a, b]
  Expr.or  a b := Expr.orN  [a, b]

The skeleton's AST now matches MirScalarExpr's variadic-only
VariadicFunc::{And, Or} shape exactly. Binary surface preserved
for laws + proofs via the sugar abbrev.

Cascade through 6 files:
* Eval.lean — drop .and / .or arms from eval.
* ColRefs.lean — drop arms from colReferencesBoundedBy /
  colReferencesUnused / colShift defs + 5 supporting theorems
  (eval_append_left_of_bounded, eval_append_right_shift,
  eval_replaceAt_of_unused, colReferencesBoundedBy_mono,
  colReferencesUnused_of_bounded, colShift_zero, colShift_add).
* Pushdown.lean — drop arms from Expr.subst + eval_subst.
* MightError.lean — drop arms from might_error + might_error_sound.
* ExprVariadic.lean — eval_andN_binary / eval_orN_binary now
  trivially rfl under the sugar def.
* Laws.lean — eval_and_comm_of_no_might_error and
  eval_or_comm_of_no_might_error rewritten to go through
  evalAndN_binary / evalOrN_binary (the singleton-extra-element
  in evalAndN unfolds via evalAnd_true_right).

18 lean jobs green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…options; evaluation-order equivalence

Restructure the Alternatives section by the four semi-orthogonal
dimensions of the design space (cell encoding, row / collection
encoding, global encoding, evaluation-order equivalence) plus the
cross-cutting channel-structure choice. Each subsection lists the
chosen option alongside open and rejected alternatives so future
iteration has the full inventory in one place.

New options surfaced:

* Cell: severity-graded `Datum::Error(severity, EvalError)`
  generalizing the strict/null lattice to a per-cell severity
  lattice.
* Row: sum-type row carrier
  `Payload = Row(List Datum) | Err(DataflowError)` with a single
  ordinary `Int` diff, closest to today's two-collection model
  (`data` / `errors`) merged into one stream signature.
* Row: N-component labeled diff
  `(d_valid, d_eval_err, d_decode_err, ...)`, intermediate
  between two-diff and per-payload `Int × ErrCount`.
* Channel: sidecar error stream keyed by stable RowKey.
* Channel: sparse two-diff wire format (default-on optimization).

New dimension: evaluation-order equivalence. SQL leaves execution
order unspecified outside CASE, so any rewrite that touches errors
needs a relation coarser than strict equality. The doc names four
candidates — strict `=`, error-set equivalence, refinement preorder
(errors as bottom, no-spurious-errors posture), refinement preorder
dual (errors as top, PostgreSQL posture) — and lists concrete
counterexamples for each: `evalAnd` left-bias on err / err,
`evalPlus` associativity on bounded `Int`, predicate pushdown
across `Cross`, constant folding inside discarded `AND` arms.

The summary table at the end of the Alternatives section lays out
every option against retractability, cell-error expressivity,
per-record cost, open-universe payloads, and chosen / rejected
status.

A new open question records the equivalence choice and lists the
rewrites it gates.
…orderable SQL eval

Mechanize the three equivalence relations on `Datum` named in the
design doc's *Evaluation-order equivalence* alternative. The current
`eval` function targets strict equality on its `Datum` codomain; the
relations here are the layer at which we phrase rewrites whose strict-
equality form is false.

* `Datum.eqErrSet a b := a = b ∨ (a.IsErr ∧ b.IsErr)` —
  the minimal upgrade from strict equality. Collapses all `.err`
  payloads into one equivalence class. Reflexive, symmetric
  (`eqErrSet_symm`). The headline witness is
  `evalAnd_err_err_eqErrSet_comm`: `evalAnd` is left-biased on
  err / err inputs (picks the left payload), so under strict equality
  swapping operands gives different `Datum`s; under `eqErrSet`
  commutativity is recovered.

* `Datum.refines a b := a = b ∨ a.IsErr` — partial order with
  errors as the least-defined element. An optimizer rewrite is
  sound under the "no spurious errors" posture iff
  `eval e1 ⊑ eval e2` pointwise. Reflexive (`refines_refl`),
  transitive (`refines_trans`), `err`-refines-anything
  (`err_refines`).

* `Datum.refinesDual a b := a = b ∨ b.IsErr` — dual order with
  errors as the top. PostgreSQL's "spurious errors permitted"
  posture. Reflexive (`refinesDual_refl`), transitive
  (`refinesDual_trans`), anything-refines-err (`refinesDual_err`).
  `refinesDual_iff_refines_swap` witnesses that the dual is just
  the reverse of `refines`; the two are kept separately so theorem
  statements can pick the intuitive direction.

Bridge lemmas: `eqErrSet_of_refines_both` and
`eqErrSet_of_refinesDual_both` — two-way refinement under either
preorder collapses to `eqErrSet`. `eqErrSet_of_eq` and
`refines_of_eq` embed strict equality into each.

The module's docstrings catalog counterexamples I attempted and
could not prove under any candidate relation:

* `evalAnd` not strictly commutative on err / err. Hidden in the
  current skeleton because `EvalError` has only one variant; fires
  as soon as a second variant is added. Recovered by `eqErrSet`.

* `evalPlus` associativity on bounded `Int`. The current `evalPlus`
  uses unbounded Lean `Int` so the counterexample is moot; lives at
  the bounded-int boundary where `MAX_INT32 + 1` overflows. Needs
  the refinement preorder direction LHS ⊑ RHS, not `eqErrSet`.

* Predicate pushdown on the err side under `cross`. Fails on
  record-level carrier mismatch under strict equality and
  `eqErrSet`-lifted-to-streams. Closes only under value-only
  equivalence under non-determinism — the motivating use case for
  lifting `eval` to a non-deterministic semantics in a future
  iteration.

* Preservation of `refines` by binary primitives. Mechanical lift
  of `Mz/Strict.lean`'s strictness lemmas against `refines` rather
  than `=`. Out of scope for this module.

19 build jobs, all green. No sorry.
…mple

Mechanize the bounded-arithmetic counterexample the design doc names
in §"Evaluation-order equivalence" as the motivating example for
relaxing strict equality on `Datum`. Where `Mz/Equiv.lean` carried
this as documented prose, it is now a live theorem.

* `Datum.EvalError` gains an `overflow` variant. Production
  `EvalError` (in `src/expr/src/scalar.rs`) has many such variants;
  one is enough to fire the asymmetry.
* `evalPlusBounded (max : Int)` in `Mz/PrimEval.lean` mirrors
  `evalPlus` with a symmetric range `[-max, max]`; an `.int + .int`
  result outside the range returns `.err .overflow`. Strict on `.err`
  and `.null` like the unbounded primitive.
* `Mz/EquivBounded.lean` discharges the associativity counterexample
  at `x = max`:
  - `evalPlusBounded_assoc_max_strict_ne` — strict equality fails.
    LHS folds `max + 1` first and errs; RHS folds `1 + (-1) = 0` and
    returns `.int max`. Disjoint constructors.
  - `evalPlusBounded_assoc_max_refines` — LHS ⊑ RHS under `refines`
    (errors as bottom). Witnesses the "no spurious errors" posture.
  - `evalPlusBounded_assoc_max_not_refines_rev` — concrete witness
    that the reverse direction fails. `.int max` is not `IsErr`.
  - `evalPlusBounded_assoc_max_refinesDual` — RHS ⊑ LHS under
    `refinesDual` (errors as top). Witnesses the PostgreSQL posture.

The bound is symmetric for proof economy; widening to
`[MIN_INT32, MAX_INT32]` is mechanical. No `sorry`. Build green
(20 jobs).
…nto image

Prepare for indexed-arity stream records by adding Mathlib as a
dependency. The dependency is declared via the Reservoir-style
`[[require]]` form in `lakefile.toml` pinned to `v4.29.1` — the same
tag the project's `lean-toolchain` already targets.

* `lakefile.lean` → `lakefile.toml` per the upstream wiki guidance.
* `lake-manifest.json` regenerated; `mathlib` resolves to commit
  `5e932f97dd25` (`v4.29.1`).
* `Dockerfile` bakes the prebuilt-olean cache into the image. The
  `COPY` layer brings only `lakefile.toml`, `lean-toolchain`, and
  `lake-manifest.json`, so the heavy
  `lake update && lake exe cache get` step is rebuilt only when
  those inputs change — not on Mz/ source edits.
* `git config --system --add safe.directory '*'` so the
  agent-uid container can read the root-owned Mathlib checkout
  without git's `safe.directory` check tripping. Without this lake
  falls through to "URL has changed" and re-clones every dependency
  on every run.
* `ci/test/lean-semantics.sh` bind-mounts only `Mz/` and `Mz.lean`
  over the image's placeholders; everything else under `/workspace`
  is supplied by the image. Lake writes the Mz library's own build
  artifacts to the container's ephemeral overlay.
* `.gitignore` / `.dockerignore` skip `.lake/`.

Fresh image build takes ~5 min (clone + cache get). Cached runtime
build is ~10 s of lake work plus docker container overhead. 20 jobs
green; no semantic content changed in this commit.
Pilot of an arity-indexed stream encoding alongside the existing
untyped `Mz/Stream.lean`. Row width is in the type
(`RowN n = List.Vector Datum n`), and operator signatures expose
arity directly:

* `filter : Expr → StreamN n → StreamN n`
* `project : List.Vector Expr m → StreamN n → StreamN m`
* `cross : StreamN n → StreamN m → StreamN (n + m)`

The pilot is intended to answer whether the indexed encoding is
easier or harder to prove against than the untyped form, ahead of
columns-by-`Fin n` reasoning needed when joins land. Two load-
bearing tests run:

* `crossOne_assoc` — record-level associativity, with the arity
  cast applied through `List.Vector.congr`. Proof is ~10 lines
  split by `StreamRecordN.mk.injEq`: row equation closes by
  `List.append_assoc` after `Subtype.ext`; diff is `mul_assoc`;
  err_diff is `ring`.

* `cross_assoc` — stream-level associativity. Statement requires
  the same arity cast lifted record-wise via `castStreamN`. Proof
  is two inductions (`s` and a sub-induction over `t` factored as
  `cross_map_left`) plus an inner `List.map_congr_left` and the
  record-level `crossOne_assoc`. ~25 lines.

Reduction lemmas (filter_nil / filter_append / project_append /
cross_nil_left / cross_nil_right / cross_cons_left /
cross_append_left) follow the same shape as the untyped `Stream.lean`
versions; the differences are confined to argument types and the
absence of `_ _ _`-style applications because Mathlib's
`List.map_append` / `List.flatMap_append` are stated with implicit
arguments.

Verdict: tolerable cost. The cast machinery (`castStreamRecordN`,
`castStreamN`, `Vector.congr`) is small and isolated; the bulk of
proof effort goes into the same record arithmetic the untyped form
already pays. Next: `filter_cross_pushdown_left` on the indexed
form, where the arity bound on the predicate's column references
becomes a type-level fact, to compare against the prose obligation
in the untyped form.

763 build jobs green (pilot pulls in Mathlib.Data.Vector +
Mathlib.Tactic.Ring). No `sorry`.
…lper

Attempting `filter_cross_pushdown_left` on the indexed-arity stream
surfaces a soundness gap in the two-diff model: the classical
relational pushdown `filter p (cross sL sR) = cross (filter p sL) sR`
is unsound on the err side whenever the right stream carries
non-zero `err_diff`.

What works:

* `filterOne_cross_pushdown_left_true` — when the predicate
  evaluates to `.bool true`, filter is the identity and `filterOne`
  commutes with `crossOne` directly.
* `filterOne_cross_pushdown_left_err` — when the predicate
  evaluates to `.err _`, data multiplicity migrates to err on both
  sides; the cross-multiplied err_diff sums rearrange to the same
  value by ring.

What does not work:

* `.bool false / .null / .int` branches. The cross's err_diff
  formula multiplies `recL.diff` against `recR.err_diff`; filter
  zeros `recL.diff` before the cross, dropping the right's
  err contribution. The LHS retains it because the filter runs
  *after* the cross has already multiplied the err in.

A concrete witness is mechanized as
`filterOne_cross_pushdown_left_unsound`:
  `recL = (·, diff = 1, err_diff = 0)`,
  `recR = (·, diff = 0, err_diff = 1)`,
  `p = .lit (.bool false)`.
LHS err_diff = 1, RHS err_diff = 0.

Operationally: the right stream's error is unconditional — it
surfaces regardless of the left's predicate. Pushdown undercounts
right-side errors. Closing the obligation requires either a
non-deterministic semantics that routes right's err through a
channel filter does not multiply against, a refinement equivalence
that ignores err_diff drops, or a stream encoding where errors are
not multiplied through cross.

This mirrors the per-payload-diff finding documented in
`Mz/Equiv.lean` for predicate pushdown over cross; the indexed-arity
form makes the obligation precise enough to expose the same gap on
the two-diff model.

Also adds `bin/in-image`, a reusable wrapper around the `docker run`
invocation that bind-mounts `Mz/` and `Mz.lean` over the image's
placeholders. Lets probes against Mathlib sources or one-shot lake
commands reuse the prebuilt cache without retyping the docker
incantation.

763 jobs green. No `sorry`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants