doc/semantics: Lean 4 mechanization of UnifiedStream error semantics by antiguru · Pull Request #36614 · MaterializeInc/materialize

antiguru · 2026-05-18T21:20:06Z

Lean 4 mechanization of Materialize's error-handling design at doc/developer/semantics/, plus the design doc that drives it at doc/developer/design/20260517_error_handling_semantics.md.

Current state (post-restart)

The skeleton is on the two-diff baseline described in the design doc's "Stream shape" section. Each stream record is (row : List Datum, diff : Int, err_diff : Int) — both ordinary Int diffs, both retractable. Cell-scoped errors live in Datum::Error inside the row; row-escalation moves multiplicity from diff to err_diff; global-scoped errors (absorbing diff marker) are spec-only.

What's in the skeleton today

Scalar evaluation: Datum, Expr, Eval, PrimEval (evalAnd / evalOr / evalIfThen / arithmetic / comparison / coalesce), four-value truth tables in Boolean / Laws.
Column-reference analyses: ColRefs (colReferencesBoundedBy, colReferencesUnused, colShift, monotonicity).
Substitution: Pushdown (Expr.subst, eval_subst).
Strictness: Strict (err- and null-propagating classes for binary primitives).
Bag layer: Bag (Row := List Datum, Relation := List Row, filterRel / project for simple bag laws).
Stream layer (new): Stream — StreamRecord {row, diff, err_diff} + filter / project / negate / unionAll / cross, plus reduction lemmas. Filter follows the design doc's Predicates rule exactly (.bool false / .null / .int zero data, .err migrates data into err side). Cross uses the two-component multiplicative rule.

18 build jobs, all green. No sorry.

What's preserved in branch history (not in current tree)

The branch carried an earlier Diff = Int × ErrCount per-error-payload design through ~33 build jobs before being restarted. That iteration produced:

The diff-multiplication soundness counterexample against predicate_pushdown over Cross(L, R) (filed as a PR comment).
The "stream equivalence" relation needed to recover soundness for the per-payload form.
Fusion / pushdown theorems on the per-payload form (FilterFusion2, ProjectFusion2, Demand2).

The design doc's Alternatives section preserves the per-payload form as a live alternative — the choice between single-Int err-diff and per-EvalError map is not yet settled and turns on whether a user-facing SQL surface needs to distinguish error kinds at the row level. The Lean work is the evidence for what each choice costs.

Major design-doc changes in this PR

Cell-scoped errors specification (Datum::Error variant).
Row-scoped errors via two-diff record shape; DataflowError carries structured payloads.
Global-scoped errors via absorbing diff marker.
Operator obligations under the two-diff shape — Predicates, Joins, Aggregates rules.
Alternatives include carrier-replacement (rejected), absorbing-only diff (rejected), and per-error-payload diff (open).
Open questions: storage cost, source-decoder errors with no row, err-side single-Int vs per-EvalError map.

Build

ci/test/lean-semantics.sh — Docker, Lean v4.29.1, lake.

🤖 Generated with Claude Code

Add design doc proposing per-cell `Datum::Error`, row-scoped errors via the existing `DataflowError`, and a diff-semiring sketch for collection-scoped errors. Define the four-valued AND/OR truth tables ({TRUE, FALSE, NULL, ERROR}) that the rest of the spec rests on. Add a v1 Lean 4 mechanization of the boolean fragment at doc/developer/semantics/. The skeleton models `Datum`, `Expr`, `evalAnd`/`evalOr`, and proves all 32 cells of the AND/OR truth tables. The pattern order in `evalAnd`/`evalOr` matches the current Rust runtime in src/expr/src/scalar/func/variadic.rs; deviating from that runtime requires a corresponding diff in Boolean.lean. CI runs lake build in a self-built Docker image (ubuntu:26.04 + elan + lean-toolchain pin), gated on changes under doc/developer/semantics/. No Mathlib dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Expand the Lean skeleton with a small set of follow-up theorems that the optimizer roadmap depends on. * `Datum.IsErr` propositional predicate plus `DecidablePred` instance, for use as a hypothesis in algebraic laws. * `Expr.not` constructor and `evalNot`, with the 4-cell `NOT` truth table and `not_not` (involution on the boolean fragment, no-op on `null` and `err`). * New `Mz/Laws.lean`: - `evalAnd_idem`, `evalOr_idem`: idempotence holds unconditionally, including on `err` (same error preserved). - `evalAnd_comm_of_no_err`, `evalOr_comm_of_no_err`: commutativity conditional on neither operand being an error. This is the law an optimizer that has run `might_error` analysis can use to justify conjunct reordering. Unconditional commutativity fails over `(err e₁, err e₂)` with distinct payloads. Associativity and `might_error` soundness are deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Land the optimizer-facing theorem the previous commit's roadmap called out as priority one. * Switch `eval` for `.ifThen` to a strict `evalIfThen` helper. The observable `Datum` is identical to the lazy form in this total skeleton (both branches are total functions); strictness collapses the ifThen proof case to a single `cases` on the condition. * Redefine `Env.get` by primitive recursion so inductive proofs can `cases` on the defining equations directly without going through `List.getD`. * New `Mz/MightError.lean`: - Per-operator helper lemmas: `evalAnd_not_err`, `evalOr_not_err`, `evalNot_not_err`, `evalIfThen_not_err`. - `Expr.might_error` conservative analyzer (literal err taints every ancestor; columns assumed err-free). - `Env.ErrFree` predicate and `Env.get_not_err` lemma. - `might_error_sound`: if `¬e.might_error` and `env.ErrFree` then `¬(eval env e).IsErr`. Structural induction on `e`, one helper per Expr constructor. * `Mz/Laws.lean`: lift `evalAnd_comm_of_no_err` and `evalOr_comm_of_no_err` through `eval` via `might_error_sound`, giving `Expr`-level reorder-safety theorems `eval_and_comm_of_no_might_error` and `eval_or_comm_of_no_might_error`. These are the laws an optimizer would cite when reordering conjuncts in the boolean fragment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* `ErrStrictUnary`, `ErrStrictBinary`, `NullStrictUnary` predicates capturing per-position strictness. The cell-scoped analogue of PostgreSQL's `STRICT` qualifier on `NULL`, applied to `err`. * Positive instances: `evalNot` is both err-strict and null-strict; the condition slot of `evalIfThen` is err-strict. * Closure: `ErrStrictUnary.comp` proves strictness is preserved under function composition, which is the property an optimizer relies on when fusing chains of strict scalar functions into a single MFP expression. * Negative results: `evalAnd` and `evalOr` are not err-strict in either argument position. The counterexamples (`FALSE AND ERR`, `TRUE OR ERR`) are also canonical regression tests — a future change to AND/OR that promoted these cells to `ERR` would break exactly these proofs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `evalCoalesce : List Datum → Datum` and prove the laws that characterize the proposed cell-scoped coalesce. Semantics: walk operands left to right; return the first concrete value. When none exists, apply a `null`-beats-`err` tiebreak — if any operand was `null`, return `null`; otherwise, return the first `err`. The state machine in `Coalesce.go` tracks a `seenNull` sticky bit and a first-`err` payload. The defining error-rescue property (`coalesce_err_rescue_bool`) and its symmetric null-rescue (`coalesce_null_rescue_bool`) together establish that a later concrete operand rescues an earlier `err` exactly as it rescues an earlier `null`. This is the explicit, user-controllable error trap referenced in the design doc. The `null`-beats-`err` tiebreak (`coalesce_null_then_err`, `coalesce_err_then_null`) preserves the PostgreSQL behavior users expect when all operands are NULL, and is dual to the strict-function rule documented in `Mz/Strict.lean`: strict functions promote `err` above `null` in per-cell results; `coalesce` is non-strict and demotes `err` below `null` in the tiebreak. Defer wiring `evalCoalesce` into `Expr` as a `.coalesce` constructor until the variadic `And`/`Or` ctors land — the termination story is shared between them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add the missing identity laws and the variadic-fold counterparts of the binary AND/OR evaluators. * `Mz/Laws.lean`: `TRUE` is the two-sided identity for `evalAnd` and `FALSE` is the two-sided identity for `evalOr`. The proofs verify each cell of the non-identity argument. These identities are also the seed values used by the variadic folds. * `Mz/Variadic.lean`: `evalAndN`, `evalOrN : List Datum → Datum` defined by right-fold. Right-fold gives the cons recurrence by `rfl` and avoids the associativity argument a left-fold would require. - Cons recurrence, nil, singleton, and binary equivalence with `evalAnd` / `evalOr`. The binary equivalence is the bridge that transports every binary truth-table cell and algebraic law to the variadic form on lists of length two. - Absorption theorems: a single `FALSE` anywhere in the operand list forces `evalAndN` to `FALSE`; symmetric statement for `TRUE` and `evalOrN`. These justify the runtime short-circuit optimization. Wiring `evalAndN`, `evalOrN`, and `evalCoalesce` into `Expr` as list-carrying constructors is deferred — the termination story for the nested inductive is shared between them and is best landed in a single commit once it's been validated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `.andN`, `.orN`, and `.coalesce` to `Expr` as variadic constructors carrying `List Expr`, with the corresponding `eval` clauses and Expr-level reduction lemmas. The variadic primitives (`evalAndN`, `evalOrN`, `evalCoalesce`) move to a new `Mz/PrimEval.lean` to avoid the circular dependency that would otherwise arise between `Mz/Eval.lean` and the variadic-law files. File reorganization: * New `Mz/PrimEval.lean`: every evaluator that operates on `Datum` or `List Datum` without referring to `Expr` — the binary boolean primitives, the environment, and the variadic primitives. * `Mz/Eval.lean` shrinks to the `eval : Env → Expr → Datum` definition. * `Mz/Variadic.lean` and `Mz/Coalesce.lean` drop their now-moved evaluator definitions and keep only the theorems. * `Mz/Boolean.lean` and `Mz/Strict.lean` switch their imports to `Mz.PrimEval` since they never used the Expr-level evaluator. `might_error_sound`: * `induction` cannot be used on the now-nested-inductive `Expr`, so the soundness proof is rewritten as a recursive `theorem` that pattern-matches the constructor and recurses on subexpressions. The signature also makes `env` explicit so the recursive calls in compound cases stay readable. * The list-carrying constructors are handled vacuously: the conservative `might_error` for them returns `true` unconditionally, so the soundness premise is absurd in those cases. A future refinement will tighten the analyzer to inspect operands. New `Mz/ExprVariadic.lean`: * `eval_andN`, `eval_orN`, `eval_coalesce` connect the Expr-level evaluator to the variadic primitives. * Empty / singleton cases for all three. * Binary equivalence: `eval env (.andN [a, b]) = eval env (.and a b)` and the dual for `.orN`, transporting `Mz/Variadic.lean`'s binary laws through `eval`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ness Refine `Expr.might_error` for the `andN` and `orN` list-carrying constructors so the analyzer is no longer maximally conservative for them. The analyzer now recurses into the operand list via `Expr.argsMightError`, declared mutually with `Expr.might_error` so Lean's structural-recursion checker accepts both sides without an explicit termination measure. `Expr.argsMightError_of_mem` is the membership-driven introduction form needed by the soundness proof — it turns "some operand might error" into the bool fold. Its contrapositive is what extracts a per-operand non-erroring hypothesis from the analyzer's verdict on the whole list. `might_error_sound` for the `.andN` and `.orN` constructors now performs real work: * Reduce `eval env (.andN args)` to `evalAndN (args.map (eval env))` via the existing `simp only [eval]` pattern. * Apply the list-level `evalAndN_not_err` (also added in this commit) with a per-operand non-erroring hypothesis. * Extract that per-operand hypothesis by destructuring `List.mem_map` and applying `Expr.argsMightError_of_mem` contrapositively against `hMe`. * Recurse via `might_error_sound` on the individual operand. `.coalesce` is still tainted unconditionally — the rescue analysis is a separate follow-up since the helper `evalCoalesce_not_err` needs a different list induction (state-machine over `Coalesce.go`). `Expr.might_error` and `Expr.argsMightError` are now mutually recursive, which forces well-founded compilation. Existing `simp only [Expr.might_error]` patterns in the proof are unaffected because they target equation lemmas, not definitional reduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refine `Expr.might_error` for the `.coalesce` constructor so the analyzer captures the rescue rule: `coalesce` might error only when every operand might error (and the operand list is non-empty). A single statically-safe operand makes the whole expression safe. Analyzer changes (mutual block): * `Expr.argsAllMightError`: companion fold to `argsMightError`, empty-list base `true`, cons case `e.might_error && rest...`. * `.coalesce []` returns `false` (empty coalesce is `.null`, never errors). * `.coalesce (a :: rest)` returns `a.might_error && argsAllMightError rest`, which is equal by definition to `argsAllMightError (a :: rest)`. Soundness chain for the `.coalesce` case: 1. `Expr.exists_safe_of_not_argsAllMightError` extracts a statically-safe operand from the negation of the analyzer. 2. `might_error_sound` recurses on that operand and produces a not-`IsErr` witness on its evaluated value. 3. `Coalesce.go_not_err` (new) is the state-machine lemma: any `Coalesce.go` invocation whose remaining list contains at least one non-erroring datum (or whose `seenNull` is already set) does not return an error. Proof is structural recursion on the list, plumbing the safety witness through the `.null` and `.err` arms. 4. `evalCoalesce_not_err_of_some_safe` packages the above into the surface lemma soundness invokes. `Coalesce.go` is made non-private in `Mz/PrimEval.lean` so the state-machine lemma can reference it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add the `Expr`-level absorption theorems that an optimizer cites when folding a variadic `AND` containing a `FALSE` operand to `FALSE` (symmetric for `OR` with `TRUE`). Two forms each: * Semantic premise: `(∃ e ∈ args, eval env e = .bool false) → eval env (.andN args) = .bool false`. Sufficient for any operand the optimizer has reduced to a constant boolean. * Syntactic premise: `Expr.lit (.bool false) ∈ args → eval env (.andN args) = .bool false`. The corollary specialized to the case where a literal `.bool false` is syntactically present in the operand list. Useful for simple constant folding. Each proof reduces `eval env (.andN args)` to `evalAndN (args.map (eval env))` via `eval_andN`, then witnesses membership of `.bool false` in the mapped list through `List.mem_map`, then invokes `evalAndN_false_absorbs` from `Mz/Variadic.lean`. The `Or`/`True` direction is symmetric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model relations as `List Row` and define the two basic relational operators on top of the per-row evaluator. * `filterRel pred rel` keeps rows whose predicate evaluates to `.bool true`. Rows evaluating to `.bool false`, `.null`, or `.err` are dropped. The skeleton silently drops `err` rows; a real implementation would route them to a separate error collection. Noted in the docstring as a known modelling gap. * `project es rel` evaluates each scalar in `es` against every row and produces a relation whose row width is `es.length`. Laws: * `filterRel_idem`: applying the same predicate twice is the same as applying it once. Proof: `List.filter_filter` + `Bool.and_self`. * `filterRel_comm`: filter commutes with filter. Proof: `List.filter_filter` twice + `Bool.and_comm`. * `project_length`: projection preserves cardinality. Direct consequence of `List.length_map`. * `project_nil`: the empty projection collapses every row to `[]`. All proofs are one-liners over Lean core's list lemmas; the laws themselves are the first relational-level rewrites an optimizer cites. Predicate pushdown across projection, joins, and aggregates remain follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model the dataflow's data and error collections as a pair carried together through operators, mirroring the structure of the real Materialize dataflow where erroring rows are routed to a separate collection rather than silently dropped. * `BagStream` is a structure with `data : Relation` and `errors : List EvalError`. * `BagStream.ofRelation` injects a plain relation with no accumulated errors. * `BagStream.filter pred s` evaluates `pred` on every row of `s.data`. Survivors (`.bool true`) stay in the data field; erroring rows contribute their payloads to the error field via `errorRows`; everything else is dropped. Two helper lemmas carry the structural fact "predicate evaluated to `.bool true` on every survivor of `filterRel`": * `rows_in_filterRel_eval_to_true` unfolds the survival condition. * `errorRows_eq_nil_of_all_true` says a relation where every row evaluates to `.bool true` contributes no errors. `errorRows_filterRel` combines them and is the key fact behind `BagStream.filter_idem` — the second pass of an idempotent filter sees only survivors of the first, so it contributes no new errors. The overall stream is therefore unchanged on the second filter, at both the data and the error level. `BagStream.project` and stream-level filter commutativity follow in later iterations; commutativity at the error level requires multiset equality on `List EvalError` since list order differs across permutations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add the classical relational rewrite: filterRel p (project es rel) = project es (filterRel (p.subst es) rel) The rewrite lets an optimizer move a `WHERE` clause through a `SELECT` clause by substituting the projection's source scalars into the predicate's column references. * `Expr.subst es e` replaces each `Expr.col i` in `e` with the i-th scalar of `es`, with out-of-bounds references defaulting to `.lit .null` so the result still evaluates to `.null`. Defined mutually with `Expr.substArgs` for the nested-list constructors (`andN`, `orN`, `coalesce`). * `Expr.substArgs_eq_map` bridges the recursive helper to the more ergonomic `args.map (·.subst es)` form for use in proofs. * `Env.get_map_eval` is the column-lookup compatibility lemma: reading column `i` from the projected row equals evaluating the i-th projection scalar against the original row. * `eval_subst` is the headline correctness theorem: substituting and then evaluating against the original row equals evaluating the unsubstituted expression against the projected row. Proof is structural pattern recursion mirroring `Expr.subst`. * `filterRel_pushdown_project` packages `eval_subst` plus `List.filter_map` + `List.filter_congr` into the relational form an optimizer cites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model the diff-field encoding of collection-scoped (global) errors proposed in the design doc. The standard differential dataflow diff type (typically `ℤ` for multiset counts) is extended with an absorbing `error` element; any sum or product involving `error` yields `error`, encoding "this collection is invalid at time `t`". * `DiffWithError α` is a two-constructor inductive: `val (x : α)` for ordinary diffs, `error` for the absorbing marker. * `Add`, `Mul`, `Zero`, `One` instances lift the underlying operations and respect the absorbing behavior. * Absorption laws (`error_add_left`, `error_add_right`, `error_mul_left`, `error_mul_right`) are the defining property. * Commutativity, associativity, and left-distributivity are proved parameterically over the underlying `α`: each lemma takes the corresponding law on `α` as a hypothesis and discharges the `DiffWithError` version by case analysis. The lemmas are not packaged as `Semiring`/`CommRing` typeclass instances because the skeleton avoids depending on Mathlib for build-time reasons. Adding the typeclass wiring is straightforward once Mathlib is on the dependency list. Tying `DiffWithError` to an actual `(Row, Time, Diff)` triple stream and proving the operator-level propagation theorems is the next step; the present file lays the algebraic groundwork those theorems rely on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model the spec's preferred encoding where data rows and errors flow through one collection rather than the `(data, errors)` pair carried by `BagStream`. The split form in `Mz/ErrStream.lean` mirrors the current Materialize runtime; the unified form mirrors what the diff-semiring extension naturally produces (any record with an `error` diff is a row-scoped error, and the same encoding extends to collection-scoped errors). * `UnifiedRow` is a sum type — `row (r : Row)` or `err (e : EvalError)`. The bare `err` variant preserves the current property that errors carry no row context. * `UnifiedStream := List UnifiedRow`. * `UnifiedStream.ofBag` packs a `BagStream` (data rows first, errors second). `UnifiedStream.split` is its inverse, splitting the carrier back via `filterMap pickRow` / `filterMap pickErr`. * `UnifiedStream.filter` runs the predicate per record, routing erroring rows to `.err` records in place. Existing `err` records pass through unchanged — the carrier handles error propagation without per-operator boilerplate. Round-trip theorem `UnifiedStream.split_ofBag : split (ofBag s) = s` proved at both the data and error field level, using four private filterMap-over-tagged-list helpers and structural induction. The cross-direction `filter ∘ ofBag ≈ ofBag ∘ filter` is left for future work — equivalence is only up to multiset equality on errors because `ofBag` fixes a concat order between data and errors, while filters that interleave the two would produce a different list order. `@[ext]` added to `BagStream` so the round-trip proof can use `apply BagStream.ext`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`List.filterMap` was passed to four `simp` invocations in the filterMap-of-mapped-list helpers but never actually used in the rewriting — the cons recurrence and the `pickRow` / `pickErr` arms combined with the IH are enough. Drop the unused arg; the linter-clean.

Model the spec rule that SQL aggregates such as `SUM`, `MIN`, `MAX`, and `AVG` propagate `err` like strict scalar functions. NULLs are skipped (matching PostgreSQL's "ignore NULLs" reading); errors are propagated, with the first one in scan order winning. * `aggCountNonNull`: `COUNT(expr)` — counts rows whose value is neither `NULL` nor `err`. Matches PostgreSQL. * `aggStrict f`: variadic strict reduction parametric over the combiner `f`. Empty list (or list of only `NULL`s) returns `NULL`; any `err` returns the first `err`; non-`NULL`/`err` values are combined via `f`. Two propagation theorems: * `aggStrict_err`: any `err` input forces an `err` output. Proof by structural recursion on the list; the `.bool` case branches on `aggStrict f rest` and shows the `err` arm of the inner match is the one that fires. * `aggStrict_no_err`: dual statement under the additional hypothesis that the combiner preserves "no-err". Captures the precondition an aggregate operator can use to guarantee its output is not an error when its column is error-free. `try_*` aggregates (the non-strict variants that swallow `err` into `NULL`) are future work; they satisfy a coalesce-style law rather than strict propagation and warrant their own file once the spec side is clearer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model the proposed `try_sum`, `try_min`, etc. that swallow `err` into `NULL` instead of propagating. Defined as a post-pass coalesce on `aggStrict`: aggTry f xs = match aggStrict f xs with | err _ => null | r => r Equivalent reading via the existing `evalCoalesce`: aggTry f xs = evalCoalesce [aggStrict f xs, .null] The defining property `aggTry_no_err` says the result is never an `err`. The companion `aggTry_eq_aggStrict_of_no_err` says the non-strict variant agrees with the strict one whenever the strict one would not have erred — so an optimizer that has already proved the inputs error-free can swap `aggTry` for `aggStrict` (and vice versa). Both lemmas reduce to case analysis on `aggStrict f xs`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Model the diff-only slice of differential dataflow's `compact` / consolidation operation: summing a list of `DiffWithError α` values and showing that an `error` diff absorbs the consolidated sum. * `DiffWithError.sumAll` sums a list of diffs starting from `0`, right-associative. The result lives in `DiffWithError α` for any base `α` with `Zero` and `Add` instances. * `sumAll_eq_error_of_mem`: the headline absorption theorem. If `error` appears anywhere in the list, the consolidated sum is `error`. Proof walks the list and uses the absorption laws from `Mz/DiffSemiring.lean` at the matching cons. * `sumAll_val_of_all_val`: companion no-error-preservation theorem. An all-`val` list sums to `val` of some `α`. The full `compact` operator also buckets records by `(row, time)`; that bucketing is orthogonal to the absorption argument and would require `DecidableEq Row` plus a time model, both follow-ups. The per-bucket inner sum modeled here is what an operator-level proof would invoke once those pieces are in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `TimedRecord` and `TimedStream` to model differential dataflow's record format with errors. `Time := Nat`; the diff is `DiffWithError α` parametric in the base diff type. Operations: * `TimedStream.consolidateAll` sums every diff in the stream, ignoring row and time. The collection-wide diff. * `TimedStream.consolidateAt t` sums every diff at time `t`, ignoring row. The per-time collection diff. Both reduce to `DiffWithError.sumAll`, so the absorption laws from `Mz/Consolidate.lean` transport: * `consolidateAll_eq_error_of_mem`: any `error`-carrying record forces the all-stream consolidation to `error`. * `consolidateAt_eq_error_of_mem`: any `error`-carrying record at time `t` forces the per-time consolidation at `t` to `error`. Per-`(row, time)` bucketing — the form `compact` actually uses in the runtime — is the next refinement and requires `DecidableEq` on `Row`. The current sums are the collection-global diffs at each time slice, which is what operator-level proofs invoke once the bucketing is layered on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Stream Two-input relational join modeled on the unified single-collection stream. Cross product is the building block; predicate join is cross followed by `UnifiedStream.filter`. * `UnifiedStream.cross l r` produces one output record per `(lu, ru)` pair: both rows ⇒ concatenated row; either side `err` ⇒ that side's `err` (left wins on conflict, matching the binary AND's first-error rule). * `UnifiedStream.join pred l r := (cross l r).filter pred`. Error propagation falls out of the carrier: an `err` record on either side contributes one `err` to the output per record on the other side, matching the diff-semiring's `error * diff = error` intuition. Theorems: `cross_nil_left` and `cross_nil_right` cover the empty cases (joining anything with the empty stream is empty). The cardinality theorem `(cross l r).length = l.length * r.length` is deferred — needs `List.length_flatMap` + Nat arithmetic that the current skeleton's lemma toolkit handles awkwardly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Partition a relation by an evaluated key expression and aggregate per group. Closes the SQL `SELECT k, AGG(v) FROM r GROUP BY k` shape the rest of the algebra implies. * `Datum` derives `DecidableEq` (via the already-decidable `EvalError` payload) so keys can be compared at runtime. * `groupBy keyExpr rel` walks rows, evaluates the key on each, and inserts into the existing group with that key or starts a new one. Output is `List (Datum × Relation)`. * `aggregateBy keyExpr valExpr f rel` applies `aggStrict` to each group's evaluated value column, producing `List (Datum × Datum)` — one aggregated value per group. Spec divergence on `err` keys documented inline: the skeleton uses standard `DecidableEq Datum`, so two `Datum.err e` values with the same payload collapse into one group. The spec requires every err key to be its own group. The natural refinement is a custom `Datum.groupKeyEq` that returns `false` whenever either side is an err; left for future work. Theorems for now cover the trivial cases (empty, singleton). Cardinality (`sum of group sizes = rel.length`) is the next concrete law and is deferred until the lemma toolkit picks up the needed Nat / List arithmetic helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `Datum.groupKeyEq` — group-key equivalence that returns `false` whenever either side is `.err`, so two error keys never coalesce even if their payloads happen to match. Build `groupByErrDistinct` and `aggregateByErrDistinct` on top, leaving the existing `groupBy` / `aggregateBy` (which use derived `DecidableEq Datum`) in place as the more permissive variant. Theorems: * `Datum.groupKeyEq_err_left`: simp lemma reducing the err-left case to `false`. * `insertIntoDistinct_err`: inserting an err-keyed row into any group list appends a fresh singleton group at the end. * `groupByErrDistinct_length_of_all_err`: when every row's key evaluates to err, the output has one group per row — closes the spec-divergence note from the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`BagStream.project es s` projects each row in `s.data` through the list of scalar expressions `es`. A row stays in the data collection only when every scalar succeeds on it; otherwise the row's err payloads (one per erroring scalar) are appended to the error collection. Helpers: * `rowAllSafe`: every projected scalar succeeds on a row. * `rowErrs`: collect every err payload from one row's projections. * `projectErrs`: aggregate `rowErrs` across the relation. Laws: * `BagStream.project_data` / `_errors` — per-field reductions. * `BagStream.project_nil_es` — empty projection list keeps every row and produces width-zero output rows. * `BagStream.project_empty_data` — empty data collection projects to empty data, preserving input errors. * `rowErrs_nil_of_all_safe` and `projectErrs_eq_nil_of_all_safe` — when every scalar succeeds on every row, the error collection is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`UnifiedStream.cross_length` proves the headline relational identity `(cross l r).length = l.length * r.length`. Cross product on the unified carrier produces exactly one output record per `(l, r)` pair regardless of which side carries an error — every err contributes one err record per element of the other side, matching the diff-semiring's `error * d = error`. Companion `UnifiedStream.filter_length_le` proves the predicate filter on the unified stream is non-expanding (each input produces zero or one output). Composing the two gives the corollary `UnifiedStream.join_length_le`: a join's output length is bounded above by `l.length * r.length`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`totalRows g = sum of g.rows.length` is the metric. Theorems: * `totalRows_insertInto`: a single insert adds exactly one row, whether the key matches an existing group or creates a new one. * `totalRows_insertIntoDistinct`: same bookkeeping for the err-distinct variant. * `totalRows_groupBy`: sum of group sizes equals the input relation's length — partitioning loses and duplicates nothing. * `totalRows_groupByErrDistinct`: the err-distinct variant preserves the same invariant. Use an explicit recursive `totalRows` definition instead of `((map ·.2.length).sum`. The latter trips `omega` because the elaborator and the simp set produce two syntactically distinct forms (`(·.2.length)` vs `List.length ·.snd`) that omega treats as independent atoms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promote `UnifiedStream` from `List UnifiedRow` to `List (UnifiedRow × DiffWithError Int)`. Every record now carries a differential-dataflow diff augmented with the absorbing `.error` marker. Row-scoped errors still propagate through the `UnifiedRow` carrier; collection-scoped errors propagate through diff multiplication / addition. Changes: * `UnifiedStream.ofBag` tags every bag record with diff `.val 1`. * `UnifiedStream.split` discards the diff component (round trip still goes through because `ofBag` only emits `.val 1` diffs; the cross-direction loses information for diffs ≠ `.val 1`). * `UnifiedStream.filter` preserves the diff of survivors; rerouted rows (predicate errs) keep their multiplicity. * `UnifiedStream.cross` combines carriers via the new `combineCarrier` helper and multiplies diffs through `DiffWithError`'s `Mul` instance. A `.error` diff on either side absorbs the product diff via `DiffWithError.error_mul_{left,right}`. New theorem `cross_diff_error_left` witnesses the absorption: crossing a left-side `.error` diff with any record on the right produces a `.error` diff in the output. Existing theorems (`cross_length`, `filter_length_le`, `join_length_le`, `split_ofBag`) carry through to the new representation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tighten diff propagation in `UnifiedStream.filter` and add the symmetric `cross` absorption theorem. `filter` now special-cases records carrying a `.error` diff, passing them through unconditionally regardless of predicate outcome. The previous version dropped such records when the predicate evaluated to `.bool false` or `.null`, which violates the semiring law that `.error` must absorb every downstream operator. `filter_length_le` still holds — each input still produces at most one output. Three new theorems witness the absorption: * `cross_diff_error_right`: symmetric counterpart to the existing `cross_diff_error_left` — a `.error` diff on the right side forces the output diff to `.error` for every record on the left. * `filter_preserves_error_diff`: a record `(uc, .error)` in the input survives the filter as `(uc, .error)` in the output, no matter what the predicate is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`UnifiedStream.consolidate` buckets unified-stream records by carrier (data row or row-scoped err) and sums per-bucket diffs. Headline theorem `consolidate_preserves_error` lifts the diff-semiring absorption rule from `Mz/Consolidate.lean` to the unified stream: an `.error` diff anywhere in the input forces a `.error` diff in the consolidated output for that carrier. Required: * `deriving DecidableEq` on `UnifiedRow` (Row already has it via `Datum`, EvalError too). * Two helper lemmas — `consolidateInto_error_diff` (inserting an `.error` diff yields `.error` for the bucket) and `consolidateInto_preserves_error_mem` (a pre-existing `.error` record survives every subsequent insert). Adding times reduces to running `consolidate` inside each time slice; that lifting is mechanical and left for the per-`(row, time)` follow-up alongside `Mz/Triple.lean`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two companion theorems for `UnifiedStream.consolidate`. `consolidate_length_le`: bucketing only merges, never expands. The output has at most as many records as the input. Helper `consolidateInto_length_le_succ` bounds a single insert at `us.length + 1`. `consolidate_no_error`: if every input diff is `.val n`, every output diff is `.val m`. The diff-semiring's `.val + .val = .val (· + ·)` keeps the consolidated buckets in the ordinary `Int` slice, so `.error` is the only source of absorption. Together with the existing `consolidate_preserves_error`, the three theorems pin down the consolidation operator: absorption on errors, non-expansion on cardinality, and stability on error-free input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`UnifiedStream.filter_comm`: filter p (filter q us) = filter q (filter p us) when neither predicate errors on the input's data rows. Reduces to `filter_filter_fuse` applied both ways, then equates `.and q p` and `.and p q` via `evalAnd_comm_of_no_err`. New helper `filter_eval_eq`: two filters with eval-equivalent predicates on every data row produce equal outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cite filter_idem (unconditional) and filter_comm (under err-freedom) alongside filter_filter_fuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`UnifiedStream.filter_project_pushdown`: filter p ∘ project es = project es ∘ filter (p.subst es), under projsAllSafe es us. Lifts BagStream.project_filter_pushdown_data (data-side only) to the unified stream — errs flow through both pipelines symmetrically under the safety hypothesis, so no data/error asymmetry remains. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Note filter_project_pushdown in the Modelable row for projection-lifting passes. Still missing: project_cross_pushdown (harder — depends on column-split semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`clampToOne_id_of_one`: clampToOne is identity when every record's diff is already .val 1 or .error. Companion to clampPositive_id_of_positive — same threshold-elision pattern, but for the set-semantics post-pass instead of the bag-semantics one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antiguru · 2026-05-18T21:29:38Z

Foundational gaps

Ranked by what's currently blocking further coverage.

Tier 1: blocks most remaining modelable passes

Stream equivalence relation. UnifiedStream is List, so equality is list-equality. Many natural laws only hold up to multiset / bag equivalence:
- unionAll_comm — fails as list eq (concat order).
- consolidate_idem — consolidate reverses order on noDup inputs (see SetOps.lean algorithm), so consolidate ∘ consolidate ≠ consolidate.
- cross_comm — Cartesian product is symmetric only as multiset.
- unionAll_distrib_cross_right — interleaves vs groups.
Fix: define UnifiedStream.Equiv ≈ List.Perm (or stronger: zero-diff and same-carrier-sum quotient), or switch to Multiset (UnifiedRow × DiffWithError Int).
Reduce operator (group-by aggregate). Mz/Aggregate.lean exists on List Datum only. No UnifiedStream lift. Single largest infra gap — blocks fusion/reduce, reduce_elision, reduce_reduction, reduction_pushdown. Needs group-key expressions, aggregate functions (sum / count / min / max / ...), and per-group error handling.
Map separate from project. UnifiedStream.project currently replaces the row with es.map (eval r). Rust's Map appends columns. Without separate Map, MapFilterProject and canonicalize_mfp can't be modeled cleanly.

Tier 2: specific pass clusters

Schema / column types. Rows are List Datum with no type info. Blocks:
- non_null_requirements — needs per-column nullability lattice.
- column_knowledge — needs per-column {literal, nullable, type} lattice.
- Width-correct join statements (width is implicit and unchecked).
- SQL type-rejection (currently evalAnd .int .int returns .int n for totality).
Other missing operators:
- TopK (sort + limit, needs ordering on Row).
- FlatMap (table-valued function).
- Constant (literal stream — would unlock fold_constants).
- Get / Let / LetRec (named bindings, fixpoint for recursive queries).
- ArrangeBy (key-indexed view — physical but cited by passes).
Equivalence-class lattice for equivalence_propagation. Currently only use-sites modelable.
Monotonicity / NoRetraction predicate for monotonic.rs. Streams that never retract.

Tier 3: correctness reach

Bridge to actual Materialize code. No translation function `MirRelationExpr → UnifiedStream` composition, no soundness statement "the Rust optimizer's rewrites are valid under this semantics." All current theorems are about an idealized model.
Real time / frontier semantics. TimedUnifiedStream has Nat times. Missing:
- Partial order on times.
- Frontier (antichain) type.
- since / upper bound machinery.
- Capability / retraction-as-of-time laws.
Fixpoint semantics. No model of LetRec / iterate. WITH RECURSIVE is invisible to the spec.
Differential dataflow proper. cross is one-shot Cartesian product, not arrangement-based. No worker partitioning. Mostly physical, but the logical retraction semantics deserve modeling.

Recommendation order

Tier 1 stream-equivalence + Reduce + separate Map unlocks the most. Schema enables a different class of analyses cleanly. Bridge to MirRelationExpr is the biggest reach but the only way the spec becomes load-bearing for real review.

Adds an Int × ErrCount component to the diff so row-scoped errs preserve row content AND retract through normal diff arithmetic. Replaces the previous design where row-scoped errs went through DataflowError-with-row-drop (lossy) or absorbing-marker (terminal). Two-layer encoding: * val (Int × ErrCount): retractable, used for row-scoped errs. * global: absorbing, used for collection-scoped errs only. Covers WHERE / join-predicate err semantics, multiplication rule under cross/join, retraction symmetry, and the rejected alternatives (carrier replacement, absorbing-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Foundation for the design-doc refactor toward non-absorbing row-scoped errs. ErrCount := EvalError → Int (pointwise add/neg, scalar mul, single). Diff := { val : Int, errs : ErrCount } with full commutative-ring-like algebra: * additive: assoc / comm / zero / neg. * multiplicative: (a, m) * (b, n) = (a*b, a • n + b • m). * distributive: mul_add / add_mul. * neg_mul / mul_neg. Zero (0, 0) absorbs *. One (1, 0) is identity for *. New file Mz/DiffErrCount.lean. Adds 33rd build job. The earlier DiffWithError (absorbing) stays in place; the next commit repurposes it as DiffWithGlobal (collection-scoped only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`DiffWithGlobal := val (Diff) | global` — absorbing only on `global`. Used for collection-scoped errs that cannot retract. The val side wraps the row-scoped `Diff` (Int × ErrCount) from DiffErrCount.lean, which is fully retractable. Mirrors DiffSemiring.lean's absorbing API at the `global` element: absorption laws on +/* (both sides), commutativity, associativity, distributivity, negation, the converse `add_eq_global_left_or_right`. The existing absorbing `DiffWithError α` stays in place; downstream files migrate to DiffWithGlobal in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Parallel module to UnifiedStream.lean. Carrier := Row (no err constructor); row-scoped errs live in the diff's ErrCount component, collection-scoped errs in DiffWithGlobal.global. * filter: predicate err converts row's valid-count copies into err-count copies via ErrCount.single. Row content preserved. * project: rowAllSafe → row replaced by es.map (eval r), diff unchanged. Unsafe → row preserved, valid count zeroed, per- erroring-scalar err counts via rowErrCount helper. * global diff passes through both operators unchanged. Includes filter_nil, filter_append, project_nil_stream, project_append reduction lemmas. Old UnifiedStream module retained; downstream operators migrate incrementally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three parallel operator modules over UnifiedStream2 (new diff- based err encoding): * Join2: cross, join. Carrier-row concat replaces combineCarrier. Diff via DiffWithGlobal.mul. Includes cross_nil_left/right, cross_append_left, cross_cons_left, cross_length. * SetOps2: unionAll, negate, exceptAll. Subset of old SetOps; clampPositive/clampToOne/distinct/intersectAll defer to later. Includes unionAll_assoc, negate_negate (via DiffWithGlobal.neg_neg), negate_append. * UnifiedConsolidate2: bucket-by-row consolidation. Row carrier is plain Row (no err marker), so consolidateInto keys on Row's DecidableEq. Includes consolidateInto_match/skip and consolidate_singleton. 38 lean build jobs all green. Old modules untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mgree

Some notes from our conversation.

mgree · 2026-05-20T19:12:21Z

+skeleton's variants are intentionally small; production
+`EvalError` (in `src/expr/src/scalar.rs`) has many more. -/
+inductive EvalError
+  | placeholder


mgree · 2026-05-20T19:15:04Z

+
+* `lit d`: literal datum.
+* `col i`: reference to column `i` in the surrounding environment.
+* `and a b`, `or a b`: binary logical conjunction and disjunction.


No idea why we'd have this AND andN.

mgree · 2026-05-20T19:15:12Z

+
+A minimal subset of `MirScalarExpr` (see `src/expr/src/scalar.rs`).
+
+The skeleton uses binary `and` and `or` rather than the variadic form


This graf is confusing. We have both in the AST, for some reason, but they're both defined in terms of primitive, binary, false-before-error-before-null semantics for AND.

mgree · 2026-05-20T19:30:29Z

+    | (.row r, d)             =>
+      match eval r pred with
+      | .bool true => [(.row r, d)]
+      | .err e     => [(.err e, d)]


Should be (.row r, .error).

mgree · 2026-05-20T19:31:19Z

+def UnifiedStream.filter (pred : Expr) (us : UnifiedStream) : UnifiedStream :=
+  us.flatMap fun ud => match ud with
+    | (_,      .error)        => [ud]
+    | (.err e, d)             => [(.err e, d)]


Seems weird to have a notion of row-level error as opposed to a row with Datum::error in it.

mgree · 2026-05-20T19:33:26Z

+
+namespace Mz
+
+inductive UnifiedRow where


I think this is not right: if we're using Datum::Error, then I don't think we need a row-level error. (As we talked about, this is the correct model for mz as she exists, but not for the version with Datum::Error.)

* TimedConsolidate2: TimedUnifiedStream2 := List (Row × Nat × DiffWithGlobal). advanceFrontier (max-based), atTime (filterMap drop), consolidateAtTime composing UnifiedStream2.consolidate. * Consolidate2: DiffWithGlobal.sumAll right-fold from 0. Plus sumAll_eq_global_of_mem (absorption), sumAll_val_of_all_val (no-escalation), sumAll_global_inv (reverse direction). 40 lean build jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TimedUnifiedStream2.consolidateAll + consolidateAtTimeFlat. Cite DiffWithGlobal.sumAll_eq_global_of_mem and sumAll_global_inv for absorption + inversion. Iff forms wrap both directions. 41 lean build jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

FilterFusion2 (331 lines): * predNoRowErr — quantifies over val records only (.global unconstrained). * filter_filter_fuse, filter_idem (unconditional!), filter_comm, filter_eval_eq. * filter_idem now needs no hypothesis: filter operates only on the diff, so a second pass sees the same row and same eval result; ErrCount.single _ 0 absorbs cleanly into prior errs. Demand2 (293 lines): * replaceAtRow always touches the row (no err-carrier carve-out). * filter_replaceAtRow_of_unused — full commute under colReferencesUnused. * project_replaceAtRow_of_unused — full commute under argsColRefUnusedList2 + es.length ≤ n. Width side condition handles project's row-rewrite asymmetry: when safe, output row is es-mapped and replaceAtRow on output touches a different position than on input. Putting n out-of-bounds in the projected row makes Env.replaceAt the identity there. argsColRefUnusedList2 renamed to avoid clash with Demand.lean's identically-named predicate. 43 lean build jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* projsAllSafe predicate + tail/head helpers. * project_project_fuse — strengthened from old: needs both `projsAllSafe es us` AND `projsAllSafe (es'.map (·.subst es)) us`. The second hypothesis is needed because new project preserves the row on unsafe — without it, fused unsafe-branch carrier (r) disagrees with step1-safe-then-step2-unsafe carrier (es.map (eval r)). Equivalent to es' safe on every projected row. * filter_project_pushdown — matches old signature exactly. JoinPushdown2 omitted — the original signature is provably false under the new diff algebra: cross's mul rule `(a*b, a•n + b•m)` loses the right-side err counts when filter pushes down (filter zeroes left val before cross). A correct version needs a `NoRowErrs r` hypothesis or a quotient-by-consolidation equivalence. Deferred. 44 lean jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antiguru · 2026-05-22T10:38:10Z

Soundness finding: predicate pushdown over `Cross(L, R)` is unsound under the new diff semantics

The Lean refactor toward the design doc's row-scoped retractable err encoding (Diff = Int × ErrCount) surfaced a real semantic gap. The optimizer's predicate_pushdown rewrite filter pred (cross l r) ↝ cross (filter pred l) r (with pred referencing left columns only) does not preserve the output collection's err multiplicities.

Concrete counterexample

l = [(la, val ⟨1, 0⟩)]                 -- 1 valid copy of la, no err counts
r = [(rb, val ⟨0, single e 1⟩)]        -- 0 valid copies of rb + 1 err copy under err e
pred = lit (bool false)                -- trivially false on every row

Cross mul rule: (a, m) * (b, n) = (a*b, a • n + b • m). So
(1, 0) * (0, single e 1) = (0, single e 1) — the valid la joined with the err copy of rb produces one err copy of la++rb under err e. The err propagates through cross as you'd expect.

LHS filter pred (cross l r) = [(la++rb, val ⟨0, single e 1⟩)]. Filter on a bool false eval zeroes val (already 0) and keeps the err count.
RHS cross (filter pred l) r = [(la++rb, val ⟨0, 0⟩)]. filter pred l zeroes left val. Then mul (0, 0) * (0, single e 1) = (0, 0 • single e 1 + 0 • 0) = (0, 0). The err count is dropped.

Mismatch: LHS preserves the err under e; RHS loses it.

Root cause

Cross's mul rule scales right-side err counts by left-side val count. Filter pushdown zeroes left val before crossing — so right-side err copies that were going to be carried by left's valid copies disappear.

Why this is invisible today

In the current production model row-scoped errs route to DataflowError. The err collection of r flows to the join output regardless of the filter — it's a separate stream, not a multiplicity inside the diff. The pushdown rewrite operates only on the data collection, which agrees with the rewrite by construction.

Once row-scoped errs become first-class diff multiplicities (the design's whole point — preserve row content + retract through normal arithmetic), the data path and err path are unified, and the multiplicative rule must account for both. The rewrite as written then changes the output.

Options

Guard the optimizer. predicate_pushdown should not fire over Cross(L, R) when R may carry err diffs (or when its errs component is non-empty). Static analysis can be conservative: assume any operator output that didn't pass through escalateRowErrs may carry err counts.
Add a compensating term. Pushdown emits cross filter_dropped_l r_errs as a side stream and unions into the output. Algebraically:

filter pred (cross l r) = cross (filter pred l) r ⊎ (compensation)

where the compensation captures what falls through. Cleaner in theory, harder in practice.
Stream equivalence quotient. Pushdown holds up to consolidation — same logical collection, different list representation. Requires defining Stream.Equiv (multiset / bag equivalence) as a first-class relation. This is the missing Tier 1 foundation mentioned in the earlier gaps comment.
Restrict the diff semantics. If "row-scoped err multiplicities should never survive past a filter on adjacent dimensions," cross mul rule could drop right errs when left val is zero. But that breaks distributivity and the standard diff-semiring laws — likely worse than the bug.

Recommendation

Option 1 (guard) is the safest near-term move. Option 3 (quotient) is the right long-term answer but requires the foundational work listed in the prior comment.

The design doc should specify which view it picks. Right now §"Joins" says a join predicate's err is symmetric with WHERE's err, but it doesn't address how pushdown rewrites interact with diff-scoped err counts. The model needs that resolution before the optimizer can target the new encoding.

Filed from the Lean attempt: the proof of filter_cross_pushdown_left (which previously succeeded under the old absorbing-only IsPureData hypothesis) does not go through here. Old hypothesis ruled out the counterexample by construction; the new diff makes it expressible.

Adds a "Stream equivalence" subsection between Global-scoped errors and SQL error semantics. Defines the denotational equality on UnifiedStream as: * data-multiplicity equality (per-row sum of val counts), plus * global-marker equality. Per-row err counts (Diff.errs) drop out of the equivalence relation. This resolves the predicate_pushdown over Cross(L, R) soundness gap surfaced by the Lean refactor. Under strict list equality the rewrite is unsound — cross's mul rule loses right-side err counts when filter zeroes left val first. Under the new equivalence the rewrite is sound because data multiplicities are unchanged and SQL observes the data side. Open question added: user-facing surface for err counts, if any. The choice constrains optimizer rewrites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Defines the denotational equality on UnifiedStream2: * dataMult s r — per-row sum of valid counts. * hasGlobal s — whether any record carries .global. * Equiv s t — dataMult-agreement across all rows + hasGlobal-agreement. Per-row err counts (Diff.errs) drop out of Equiv. Two streams agreeing on data multiplicities and global status are equivalent regardless of how ErrCount is distributed. Foundation lemmas shipped: * Equiv.refl / symm / trans (+ Trans instance). * dataMult_append, hasGlobal_append distribute over ++. * Equiv.append — congruence under concatenation. * Equiv.errsChange_singleton — explicit witness that ErrCount differences don't break Equiv. * Equiv.cons_zero — a (r, val ⟨0, m⟩) record can be absorbed into the tail. Future work (deferred): * Equiv-congruence for filter/project/cross/negate/unionAll. * filter_cross_pushdown_left modulo Equiv (the rewrite that fails under strict equality). * Equiv ↔ multiset bijection. 45 lean jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review (mgree). The placeholder variant was a stub left over from initial scaffolding; divisionByZero is the real variant the skeleton needs. Strict.lean's four err-strict counterexample proofs switch to .divisionByZero — any concrete EvalError works to instantiate "some err value." Did NOT do the and/orN rename in the same commit: it's an 11-file, 33-occurrence cascade through ColRefs, Pushdown, MightError, FilterFusion, FilterFusion2 (every match arm on .and/.or constructors). Will report scope separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review (mgree). Original docstring said the skeleton uses binary and/or "rather than" variadic, which was misleading — the skeleton carries both. New docstring explains why: * Binary and/or is the natural object for the four-value truth tables and the filter-fusion theorems. * Variadic andN/orN matches the runtime MirScalarExpr shape so optimizer rewrites that target the runtime AST can be modeled faithfully. The two share semantics via evalAndN = right-fold of evalAnd, so every binary-form law lifts. The dual representation is intentional, with bounded duplication only on eval, ColRefs, and Pushdown. A future cleanup may collapse the constructors and treat binary as sugar; the doc notes the cost (cascade through ~33 match sites). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Major revision after review feedback (mgree) + design discussion. New baseline (replacing diff-encoded multiplicities): * Stream record shape: (row : List Datum, diff : Int, err_diff : Int). Both ordinary Int. Both retract. Single-stream form equivalent to two-collection (data, errors). * Cell-scoped errs: Datum::Error inside the row (unchanged design). * Row-escalation: operator moves multiplicity from diff to err_diff. DataflowError carries structured payload. * Global-scoped: DiffWithGlobal = val(Int) | global, absorbing only. Predicates rule rewritten in operator-explicit form: * TRUE → (r, diff, err_diff) * FALSE/NULL → (r, 0, err_diff) * ERROR e → (r, 0, err_diff + diff) Joins rule rewritten with two-component multiplication: (rL, dL, eL) × (rR, dR, eR) = (rL ++ rR, dL·dR, dL·eR + eL·dR + eL·eR). Alternatives section: * Per-error-payload diff component (Diff = Int × ErrCount): kept as open alternative, NOT rejected. Records what the per-payload form buys (which-error visibility at diff level, multi-error distinguishability) vs what the baseline buys (Int arithmetic, predicate_pushdown soundness under =, no equivalence quotient). The Lean branch's worked-out ErrCount form preserved as evidence. Open question added: should err-side be single Int or per-EvalError map? Answer depends on a user-facing SQL surface decision. Other deletions: * Stream equivalence section removed (no longer needed under the two-diff baseline; strict list equality works). * "Row-scoped errors: DataflowError plus diff-encoded multiplicities" reverted to a cleaner statement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per the revised design doc, the stream record shape is now (row : List Datum, diff : Int, err_diff : Int). The previous work (Diff = Int × ErrCount, DiffWithGlobal, UnifiedStream2, Equiv, fusion/pushdown theorems on the per-error-payload form) is preserved in git history on this branch as a worked alternative. Files deleted (28): * All diff-encoding modules: DiffErrCount, DiffSemiring, DiffWithGlobal, Equiv. * All stream-shape modules: UnifiedStream(2), TimedConsolidate(2), Consolidate(2), UnifiedConsolidate(2), Triple(2). * All operator modules: Join(2), SetOps(2), ErrStream, Aggregate, GroupBy. * All fusion / pushdown / demand modules: JoinPushdown, FilterFusion(2), ProjectFusion(2), Demand(2). Files retained (scalar-level + bag): * Datum, Expr, Eval, Boolean, PrimEval, Laws, Strict (scalar evaluation + truth tables). * ColRefs, Pushdown (column-reference analyses + Expr.subst). * Coalesce, MightError, Variadic, ExprVariadic. * Bag (Row := Env, plus simple bag-level operators). New file: * Mz/Stream.lean — fresh skeleton on the two-diff baseline. StreamRecord {row, diff, err_diff}, filter / project / negate / unionAll / cross with reduction lemmas. Per-record filter follows the design doc's Predicates rule exactly: data drops on false/null/int, migrates to err side on .err (with diff added to err_diff). Cross uses the two-component multiplicative rule. 18 lean build jobs green (was 45 under the previous design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review (mgree). After the restart, the cascade is small enough to absorb. Expr loses the .and / .or constructors; they become sugar: Expr.and a b := Expr.andN [a, b] Expr.or a b := Expr.orN [a, b] The skeleton's AST now matches MirScalarExpr's variadic-only VariadicFunc::{And, Or} shape exactly. Binary surface preserved for laws + proofs via the sugar abbrev. Cascade through 6 files: * Eval.lean — drop .and / .or arms from eval. * ColRefs.lean — drop arms from colReferencesBoundedBy / colReferencesUnused / colShift defs + 5 supporting theorems (eval_append_left_of_bounded, eval_append_right_shift, eval_replaceAt_of_unused, colReferencesBoundedBy_mono, colReferencesUnused_of_bounded, colShift_zero, colShift_add). * Pushdown.lean — drop arms from Expr.subst + eval_subst. * MightError.lean — drop arms from might_error + might_error_sound. * ExprVariadic.lean — eval_andN_binary / eval_orN_binary now trivially rfl under the sugar def. * Laws.lean — eval_and_comm_of_no_might_error and eval_or_comm_of_no_might_error rewritten to go through evalAndN_binary / evalOrN_binary (the singleton-extra-element in evalAndN unfolds via evalAnd_true_right). 18 lean jobs green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…options; evaluation-order equivalence Restructure the Alternatives section by the four semi-orthogonal dimensions of the design space (cell encoding, row / collection encoding, global encoding, evaluation-order equivalence) plus the cross-cutting channel-structure choice. Each subsection lists the chosen option alongside open and rejected alternatives so future iteration has the full inventory in one place. New options surfaced: * Cell: severity-graded `Datum::Error(severity, EvalError)` generalizing the strict/null lattice to a per-cell severity lattice. * Row: sum-type row carrier `Payload = Row(List Datum) | Err(DataflowError)` with a single ordinary `Int` diff, closest to today's two-collection model (`data` / `errors`) merged into one stream signature. * Row: N-component labeled diff `(d_valid, d_eval_err, d_decode_err, ...)`, intermediate between two-diff and per-payload `Int × ErrCount`. * Channel: sidecar error stream keyed by stable RowKey. * Channel: sparse two-diff wire format (default-on optimization). New dimension: evaluation-order equivalence. SQL leaves execution order unspecified outside CASE, so any rewrite that touches errors needs a relation coarser than strict equality. The doc names four candidates — strict `=`, error-set equivalence, refinement preorder (errors as bottom, no-spurious-errors posture), refinement preorder dual (errors as top, PostgreSQL posture) — and lists concrete counterexamples for each: `evalAnd` left-bias on err / err, `evalPlus` associativity on bounded `Int`, predicate pushdown across `Cross`, constant folding inside discarded `AND` arms. The summary table at the end of the Alternatives section lays out every option against retractability, cell-error expressivity, per-record cost, open-universe payloads, and chosen / rejected status. A new open question records the equivalence choice and lists the rewrites it gates.

…orderable SQL eval Mechanize the three equivalence relations on `Datum` named in the design doc's *Evaluation-order equivalence* alternative. The current `eval` function targets strict equality on its `Datum` codomain; the relations here are the layer at which we phrase rewrites whose strict- equality form is false. * `Datum.eqErrSet a b := a = b ∨ (a.IsErr ∧ b.IsErr)` — the minimal upgrade from strict equality. Collapses all `.err` payloads into one equivalence class. Reflexive, symmetric (`eqErrSet_symm`). The headline witness is `evalAnd_err_err_eqErrSet_comm`: `evalAnd` is left-biased on err / err inputs (picks the left payload), so under strict equality swapping operands gives different `Datum`s; under `eqErrSet` commutativity is recovered. * `Datum.refines a b := a = b ∨ a.IsErr` — partial order with errors as the least-defined element. An optimizer rewrite is sound under the "no spurious errors" posture iff `eval e1 ⊑ eval e2` pointwise. Reflexive (`refines_refl`), transitive (`refines_trans`), `err`-refines-anything (`err_refines`). * `Datum.refinesDual a b := a = b ∨ b.IsErr` — dual order with errors as the top. PostgreSQL's "spurious errors permitted" posture. Reflexive (`refinesDual_refl`), transitive (`refinesDual_trans`), anything-refines-err (`refinesDual_err`). `refinesDual_iff_refines_swap` witnesses that the dual is just the reverse of `refines`; the two are kept separately so theorem statements can pick the intuitive direction. Bridge lemmas: `eqErrSet_of_refines_both` and `eqErrSet_of_refinesDual_both` — two-way refinement under either preorder collapses to `eqErrSet`. `eqErrSet_of_eq` and `refines_of_eq` embed strict equality into each. The module's docstrings catalog counterexamples I attempted and could not prove under any candidate relation: * `evalAnd` not strictly commutative on err / err. Hidden in the current skeleton because `EvalError` has only one variant; fires as soon as a second variant is added. Recovered by `eqErrSet`. * `evalPlus` associativity on bounded `Int`. The current `evalPlus` uses unbounded Lean `Int` so the counterexample is moot; lives at the bounded-int boundary where `MAX_INT32 + 1` overflows. Needs the refinement preorder direction LHS ⊑ RHS, not `eqErrSet`. * Predicate pushdown on the err side under `cross`. Fails on record-level carrier mismatch under strict equality and `eqErrSet`-lifted-to-streams. Closes only under value-only equivalence under non-determinism — the motivating use case for lifting `eval` to a non-deterministic semantics in a future iteration. * Preservation of `refines` by binary primitives. Mechanical lift of `Mz/Strict.lean`'s strictness lemmas against `refines` rather than `=`. Out of scope for this module. 19 build jobs, all green. No sorry.

…mple Mechanize the bounded-arithmetic counterexample the design doc names in §"Evaluation-order equivalence" as the motivating example for relaxing strict equality on `Datum`. Where `Mz/Equiv.lean` carried this as documented prose, it is now a live theorem. * `Datum.EvalError` gains an `overflow` variant. Production `EvalError` (in `src/expr/src/scalar.rs`) has many such variants; one is enough to fire the asymmetry. * `evalPlusBounded (max : Int)` in `Mz/PrimEval.lean` mirrors `evalPlus` with a symmetric range `[-max, max]`; an `.int + .int` result outside the range returns `.err .overflow`. Strict on `.err` and `.null` like the unbounded primitive. * `Mz/EquivBounded.lean` discharges the associativity counterexample at `x = max`: - `evalPlusBounded_assoc_max_strict_ne` — strict equality fails. LHS folds `max + 1` first and errs; RHS folds `1 + (-1) = 0` and returns `.int max`. Disjoint constructors. - `evalPlusBounded_assoc_max_refines` — LHS ⊑ RHS under `refines` (errors as bottom). Witnesses the "no spurious errors" posture. - `evalPlusBounded_assoc_max_not_refines_rev` — concrete witness that the reverse direction fails. `.int max` is not `IsErr`. - `evalPlusBounded_assoc_max_refinesDual` — RHS ⊑ LHS under `refinesDual` (errors as top). Witnesses the PostgreSQL posture. The bound is symmetric for proof economy; widening to `[MIN_INT32, MAX_INT32]` is mechanical. No `sorry`. Build green (20 jobs).

…nto image Prepare for indexed-arity stream records by adding Mathlib as a dependency. The dependency is declared via the Reservoir-style `[[require]]` form in `lakefile.toml` pinned to `v4.29.1` — the same tag the project's `lean-toolchain` already targets. * `lakefile.lean` → `lakefile.toml` per the upstream wiki guidance. * `lake-manifest.json` regenerated; `mathlib` resolves to commit `5e932f97dd25` (`v4.29.1`). * `Dockerfile` bakes the prebuilt-olean cache into the image. The `COPY` layer brings only `lakefile.toml`, `lean-toolchain`, and `lake-manifest.json`, so the heavy `lake update && lake exe cache get` step is rebuilt only when those inputs change — not on Mz/ source edits. * `git config --system --add safe.directory '*'` so the agent-uid container can read the root-owned Mathlib checkout without git's `safe.directory` check tripping. Without this lake falls through to "URL has changed" and re-clones every dependency on every run. * `ci/test/lean-semantics.sh` bind-mounts only `Mz/` and `Mz.lean` over the image's placeholders; everything else under `/workspace` is supplied by the image. Lake writes the Mz library's own build artifacts to the container's ephemeral overlay. * `.gitignore` / `.dockerignore` skip `.lake/`. Fresh image build takes ~5 min (clone + cache get). Cached runtime build is ~10 s of lake work plus docker container overhead. 20 jobs green; no semantic content changed in this commit.

Pilot of an arity-indexed stream encoding alongside the existing untyped `Mz/Stream.lean`. Row width is in the type (`RowN n = List.Vector Datum n`), and operator signatures expose arity directly: * `filter : Expr → StreamN n → StreamN n` * `project : List.Vector Expr m → StreamN n → StreamN m` * `cross : StreamN n → StreamN m → StreamN (n + m)` The pilot is intended to answer whether the indexed encoding is easier or harder to prove against than the untyped form, ahead of columns-by-`Fin n` reasoning needed when joins land. Two load- bearing tests run: * `crossOne_assoc` — record-level associativity, with the arity cast applied through `List.Vector.congr`. Proof is ~10 lines split by `StreamRecordN.mk.injEq`: row equation closes by `List.append_assoc` after `Subtype.ext`; diff is `mul_assoc`; err_diff is `ring`. * `cross_assoc` — stream-level associativity. Statement requires the same arity cast lifted record-wise via `castStreamN`. Proof is two inductions (`s` and a sub-induction over `t` factored as `cross_map_left`) plus an inner `List.map_congr_left` and the record-level `crossOne_assoc`. ~25 lines. Reduction lemmas (filter_nil / filter_append / project_append / cross_nil_left / cross_nil_right / cross_cons_left / cross_append_left) follow the same shape as the untyped `Stream.lean` versions; the differences are confined to argument types and the absence of `_ _ _`-style applications because Mathlib's `List.map_append` / `List.flatMap_append` are stated with implicit arguments. Verdict: tolerable cost. The cast machinery (`castStreamRecordN`, `castStreamN`, `Vector.congr`) is small and isolated; the bulk of proof effort goes into the same record arithmetic the untyped form already pays. Next: `filter_cross_pushdown_left` on the indexed form, where the arity bound on the predicate's column references becomes a type-level fact, to compare against the prose obligation in the untyped form. 763 build jobs green (pilot pulls in Mathlib.Data.Vector + Mathlib.Tactic.Ring). No `sorry`.

…lper Attempting `filter_cross_pushdown_left` on the indexed-arity stream surfaces a soundness gap in the two-diff model: the classical relational pushdown `filter p (cross sL sR) = cross (filter p sL) sR` is unsound on the err side whenever the right stream carries non-zero `err_diff`. What works: * `filterOne_cross_pushdown_left_true` — when the predicate evaluates to `.bool true`, filter is the identity and `filterOne` commutes with `crossOne` directly. * `filterOne_cross_pushdown_left_err` — when the predicate evaluates to `.err _`, data multiplicity migrates to err on both sides; the cross-multiplied err_diff sums rearrange to the same value by ring. What does not work: * `.bool false / .null / .int` branches. The cross's err_diff formula multiplies `recL.diff` against `recR.err_diff`; filter zeros `recL.diff` before the cross, dropping the right's err contribution. The LHS retains it because the filter runs *after* the cross has already multiplied the err in. A concrete witness is mechanized as `filterOne_cross_pushdown_left_unsound`: `recL = (·, diff = 1, err_diff = 0)`, `recR = (·, diff = 0, err_diff = 1)`, `p = .lit (.bool false)`. LHS err_diff = 1, RHS err_diff = 0. Operationally: the right stream's error is unconditional — it surfaces regardless of the left's predicate. Pushdown undercounts right-side errors. Closing the obligation requires either a non-deterministic semantics that routes right's err through a channel filter does not multiply against, a refinement equivalence that ignores err_diff drops, or a stream encoding where errors are not multiplied through cross. This mirrors the per-payload-diff finding documented in `Mz/Equiv.lean` for predicate pushdown over cross; the indexed-arity form makes the obligation precise enough to expose the same gap on the two-diff model. Also adds `bin/in-image`, a reusable wrapper around the `docker run` invocation that bind-mounts `Mz/` and `Mz.lean` over the image's placeholders. Lets probes against Mathlib sources or one-shot lake commands reuse the prebuilt cache without retyping the docker incantation. 763 jobs green. No `sorry`.

antiguru and others added 30 commits May 18, 2026 23:17

antiguru and others added 5 commits May 18, 2026 23:17

doc/semantics: transforms.md, idem + comm filter laws

7f098a1

Cite filter_idem (unconditional) and filter_comm (under err-freedom) alongside filter_filter_fuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antiguru and others added 5 commits May 20, 2026 11:45

mgree reviewed May 20, 2026

View reviewed changes

antiguru and others added 4 commits May 20, 2026 16:08

antiguru and others added 13 commits May 22, 2026 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/semantics: Lean 4 mechanization of UnifiedStream error semantics#36614

doc/semantics: Lean 4 mechanization of UnifiedStream error semantics#36614
antiguru wants to merge 141 commits into
MaterializeInc:mainfrom
antiguru:lean-semantics-skeleton

antiguru commented May 18, 2026 •

edited

Loading

Uh oh!

antiguru commented May 18, 2026

Uh oh!

mgree left a comment

Uh oh!

mgree May 20, 2026

Uh oh!

mgree May 20, 2026

Uh oh!

mgree May 20, 2026

Uh oh!

mgree May 20, 2026

Uh oh!

mgree May 20, 2026

Uh oh!

mgree May 20, 2026

Uh oh!

antiguru commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		A minimal subset of `MirScalarExpr` (see `src/expr/src/scalar.rs`).

		The skeleton uses binary `and` and `or` rather than the variadic form

Conversation

antiguru commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current state (post-restart)

What's in the skeleton today

What's preserved in branch history (not in current tree)

Major design-doc changes in this PR

Build

Uh oh!

antiguru commented May 18, 2026

Foundational gaps

Tier 1: blocks most remaining modelable passes

Tier 2: specific pass clusters

Tier 3: correctness reach

Recommendation order

Uh oh!

mgree left a comment

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

mgree May 20, 2026

Choose a reason for hiding this comment

Uh oh!

antiguru commented May 22, 2026

Soundness finding: predicate pushdown over Cross(L, R) is unsound under the new diff semantics

Concrete counterexample

Root cause

Why this is invisible today

Options

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antiguru commented May 18, 2026 •

edited

Loading

Soundness finding: predicate pushdown over `Cross(L, R)` is unsound under the new diff semantics