refactor(drive): unify AVG no-prove dispatch into a single count+sum walk#3690
Conversation
…walk (#3687) Closes #3687. The AVG dispatcher's no-prove path previously composed parallel `DocumentCountRequest` + `DocumentSumRequest` calls under a shared read transaction and zipped the responses. Correct but double-walked grovedb per AVG query and required count's and sum's routing tables to stay in lock-step (PR #3661 caught one drift). A new `Drive::execute_document_count_and_sum_request` reads `(count, sum)` from each visited PCPS / CountSumTree element in one walk via `Element::count_sum_value_or_default()`. Mode resolution reuses sum's existing versioned routing table so a single source of truth governs the (where × mode) → executor decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
📖 Book Preview built successfully. Download the preview from the workflow artifacts. Updated at 2026-05-20T04:01:01.657Z |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces AVG no‑prove compose‑and‑zip with a unified ChangesAVG No‑Prove Single‑Walk Unification
Sequence DiagramsequenceDiagram
participant C as Client
participant A as AvgDispatcher
participant J as CountSumDispatcher
participant E as Executor
participant G as GroveDB
C->>A: DocumentAverageRequest (prove=false)
A->>J: forward unified request
J->>E: route by DocumentSumMode
E->>G: single grove_get / grove_query walk (PCPS elements)
G-->>E: elements with count_sum_value_or_default()
E-->>J: DocumentAverageResponse (Aggregate or Entries)
J-->>A: return response
A-->>C: final DocumentAverageResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (1)
177-190: ⚡ Quick winEnforce mode→shape invariants instead of permissive passthrough.
The fallback
(_, resp) => Ok(resp)can silently leak a wrong response shape for a mode if a downstream executor changes behavior. Fail loud here to preserve the wire contract.Proposed guard-rail diff
- match (request.mode, response) { - (AverageMode::Aggregate, resp) => Ok(resp), + match (request.mode, response) { + ( + AverageMode::Aggregate, + resp @ DocumentAverageResponse::Aggregate { .. }, + ) => Ok(resp), (AverageMode::GroupByIn, DocumentAverageResponse::Aggregate { count, sum }) => { Ok(DocumentAverageResponse::Entries(vec![ super::super::drive_document_average_query::AverageEntry { in_key: None, key: Vec::new(), count: Some(count), sum: Some(sum), }, ])) } - (_, resp) => Ok(resp), + ( + AverageMode::GroupByRange | AverageMode::GroupByCompound, + resp @ DocumentAverageResponse::Entries(_), + ) => Ok(resp), + (mode, _) => Err(Error::Drive( + crate::error::drive::DriveError::CorruptedCodeExecution( + match mode { + AverageMode::Aggregate => "unexpected non-aggregate response for Aggregate mode", + AverageMode::GroupByIn => "unexpected non-aggregate response for GroupByIn mode", + AverageMode::GroupByRange | AverageMode::GroupByCompound => + "unexpected non-entries response for grouped range mode", + }, + ), + )), }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs` around lines 177 - 190, The match currently in drive_dispatcher.rs permissively falls through with (_, resp) => Ok(resp), which can leak incorrect response shapes; replace this permissive fallback with explicit shape checks and return a clear error when the combination of AverageMode (e.g., AverageMode::Aggregate, AverageMode::GroupByIn) and DocumentAverageResponse (e.g., DocumentAverageResponse::Aggregate, DocumentAverageResponse::Entries) is invalid. Concretely, keep the existing arms for (AverageMode::Aggregate, DocumentAverageResponse::Aggregate) and (AverageMode::GroupByIn, DocumentAverageResponse::Aggregate -> convert to Entries) but remove the general (_, resp) => Ok(resp) and instead return an Err (or propagate a typed error) describing the unexpected mode/response pair (include the mode and response variant in the error) so any downstream mismatch fails loudly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:
- Around line 177-190: The match currently in drive_dispatcher.rs permissively
falls through with (_, resp) => Ok(resp), which can leak incorrect response
shapes; replace this permissive fallback with explicit shape checks and return a
clear error when the combination of AverageMode (e.g., AverageMode::Aggregate,
AverageMode::GroupByIn) and DocumentAverageResponse (e.g.,
DocumentAverageResponse::Aggregate, DocumentAverageResponse::Entries) is
invalid. Concretely, keep the existing arms for (AverageMode::Aggregate,
DocumentAverageResponse::Aggregate) and (AverageMode::GroupByIn,
DocumentAverageResponse::Aggregate -> convert to Entries) but remove the general
(_, resp) => Ok(resp) and instead return an Err (or propagate a typed error)
describing the unexpected mode/response pair (include the mode and response
variant in the error) so any downstream mismatch fails loudly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4932f885-48c2-494d-951b-273c889f5320
📒 Files selected for processing (10)
book/src/drive/average-index-examples.mdpackages/rs-drive/src/query/drive_document_average_query/drive_dispatcher.rspackages/rs-drive/src/query/drive_document_average_query/mod.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/mod.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/per_in_value.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/total.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/mod.rspackages/rs-drive/src/query/mod.rs
…cher Address CodeRabbit review feedback on #3690. The permissive `(_, resp) => Ok(resp)` fallback in the RangeNoProof second-stage shape adapter could silently leak a wrong response shape if a downstream executor changed behavior. Replace with an exhaustive match: every legal mode×shape pairing is named explicitly, and any other combination surfaces as `CorruptedCodeExecution` with a mode-specific message rather than forwarding the unexpected shape to the gRPC handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed in e7be89d: replaced the permissive |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:
- Around line 75-97: The execute_document_count_and_sum_request currently always
routes with prove=false but accepts a generic DocumentAverageRequest; add an
upfront guard in execute_document_count_and_sum_request that rejects requests
with request.prove == true (returning an appropriate Error/invalid-argument
error) so callers cannot request proofs here, or alternatively change the
function signature to accept an explicitly no-proof request type; place the
check before converting AverageMode → SumMode and before calling
detect_sum_mode_from_inputs to enforce the no-prove precondition at the API
boundary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 48e44735-0284-4f5b-8940-b558ec1c3147
📒 Files selected for processing (1)
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The refactor consolidates AVG no-prove dispatch into a single count+sum walk and is structurally sound. Verified two convergent issues: the joint dispatcher hard-codes carrier_outer_limit: None and never reads request.limit, breaking the documented limit contract for distinct GroupByRange/GroupByCompound; and the flat-aggregate range branch trades sum's O(log n) merk accumulator for an O(n) Rust fold over every PCPS terminator. Also flagged: bypass of validate_and_canonicalize_where_clauses that count's dispatcher used to apply on the AVG composition path, an uncovered primary-key fast path, and a misleading drop-predicate comment.
🔴 1 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)
5 additional finding(s)
blocking: AVG no-prove dispatcher silently drops `request.limit`, breaking the documented limit contract on distinct grouped modes
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 119)
Both the PerInValue and RangeNoProof arms build RangeSumOptions { carrier_outer_limit: None, .. } and never read request.limit or request.drive_config. The executor then calls sum_query.distinct_sum_path_query(None::<u16>, ...), so the outer walk is effectively unbounded.
This contradicts the contract documented on DocumentAverageRequest::limit (drive_document_average_query/mod.rs:121-145): on the no-proof path, unset limit is supposed to fall back to drive_config.default_query_limit and explicit limit > max_query_limit is supposed to be clamped to max_query_limit. Count's dispatcher honoured this (drive_document_count_query::drive_dispatcher.rs:496-507) and was the side enforcing the clamp on the pre-#3690 AVG composition path; with the joint dispatcher there's no enforcement at all.
User-visible effect: SELECT AVG(...) GROUP BY <range field> LIMIT n now returns more than n rows, and broad GroupByRange/GroupByCompound queries over high-cardinality indexes will walk and serialise every matched terminator regardless of the operator's max_query_limit. Compute and propagate an effective_limit = request.limit.unwrap_or(drive_config.default_query_limit).min(drive_config.max_query_limit) into RangeSumOptions::carrier_outer_limit (and into distinct_sum_path_query) for the distinct shapes, matching count's behaviour.
suggestion: Flat-aggregate range branch replaces sum's O(log n) merk accumulator with an O(n) Rust fold over every PCPS terminator
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 110)
For AverageMode::Aggregate + range (and the compound GroupByIn + range fold), the joint executor unconditionally routes through distinct_sum_path_query(...) + grove_get_raw_path_query(...) and folds (count, sum) for every visited element in Rust (lines 124-188).
Sum's pre-#3690 flat-aggregate path used grove.query_aggregate_sum, the merk-internal AggregateSumOnRange primitive that returns a single i64 in O(log n) (execute_range_sum.rs:137-148). Count's flat-aggregate path used the analogous query_aggregate_count. The pre-#3690 AVG composition therefore did two O(log n) accumulator reads; this PR replaces them with one O(n) walk over every distinct in-range terminator.
The module docstring (lines 8-29) acknowledges this as a known tradeoff pending a combined query_aggregate_count_and_sum grovedb primitive, and the PR description's "halving the grovedb work per AVG query" claim is accurate for point-lookup / PerInValue / distinct shapes — but it is an asymptotic regression on flat-aggregate range over high-cardinality indexes that is reachable from an unauthenticated DAPI client. Until the grovedb-side primitive lands, consider keeping the flat-aggregate non-distinct range branch on the parallel accumulator composition (query_aggregate_sum + query_aggregate_count under a shared transaction) and reserving the joint walk for shapes that already require materialising distinct keys. At minimum, soften the PR description's perf claim and call this branch out explicitly in the dispatcher docstring so the regression is visible to operators.
suggestion: AVG no-prove path no longer runs `validate_and_canonicalize_where_clauses`
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 75)
Pre-#3690 the AVG no-prove dispatcher composed DocumentCountRequest + DocumentSumRequest, and count's dispatcher (drive_document_count_query::drive_dispatcher.rs:432) ran validate_and_canonicalize_where_clauses(request.where_clauses)? which rejects malformed shapes (duplicate Equal on the same field, multiple Ins, contradictory ranges via group_clauses) and canonicalises [field > A, field < B] pairs into a single field between* [A, B] clause. Sum's dispatcher never validated, so the count side was the validator on the AVG composition path.
The new execute_document_count_and_sum_request forwards request.where_clauses to the joint executors untouched. Malformed AVG queries that count previously rejected now reach the executors directly and either succeed with arbitrarily-reduced results (e.g., find_summable_index_for_where_clauses collapses duplicate same-field clauses via BTreeSet) or fail downstream with less precise errors. AVG queries that previously had [> A, < B] canonicalised into between* may now hit a different index-picker decision. Run the validator either inside the joint dispatcher entry point or one level up in execute_document_average_request, so the AVG surface keeps the accept/reject contract it had pre-#3690.
suggestion: Empty-where `read_primary_key_count_sum_tree` fast path is not covered by the new parity tests
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/total.rs (line 191)
The author explicitly notes inside joint_total_executor_matches_independent_count_plus_sum (drive_dispatcher.rs:990-998): "the empty-where branch requires documents_summable; if the doctype lacks it, the picker is invoked with empty where, which find_summable_index_for_where_clauses rejects ... So we test Equal-only-fully-covered instead." That means the where_clauses.is_empty() && document_type.documents_countable() && documents_summable() == Some(sum_property) fast path and the entire read_primary_key_count_sum_tree helper (which raises CorruptedCodeExecution if the primary-key element is missing) are exercised by zero tests in this PR.
This is the consensus-critical doctype-level fast path; a regression here would silently produce wrong (count, sum) aggregates for the most-trafficked AVG shape (unfiltered total). Extend the test fixture so a doctype with both documents_summable and documents_countable can be produced, then add a parity test directly against the empty-where Total branch.
nitpick: Drop predicate `count == 0 && sum == 0` does not actually match sum's distinct contract
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 197)
The inline comment claims the conjunctive predicate "matches sum's distinct contract (which drops sum == 0 rows)", but sum's distinct executor at drive_document_sum_query/execute_range_sum.rs:187 drops solely on sum == 0 regardless of count. The joint predicate count == 0 && sum == 0 is strictly weaker: a row with count > 0, sum == 0 (e.g., several documents whose property value is 0, or whose signed values cancel) is kept by the joint executor but would have been dropped by sum's standalone distinct dispatch.
The widening is arguably more correct for AVG semantics — count > 0 is informative even when the sum is zero — but the comment as written is wrong. Either reword to acknowledge the intentional widening (sum drops on sum == 0; joint AVG drops only when the row is fully empty so as to preserve count == N/sum == 0 rows), or tighten the predicate to match sum literally. None of the parity tests exercise a sum == 0, count > 0 row, so the divergence is currently unobserved.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [BLOCKING] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:119-158: AVG no-prove dispatcher silently drops `request.limit`, breaking the documented limit contract on distinct grouped modes
Both the `PerInValue` and `RangeNoProof` arms build `RangeSumOptions { carrier_outer_limit: None, .. }` and never read `request.limit` or `request.drive_config`. The executor then calls `sum_query.distinct_sum_path_query(None::<u16>, ...)`, so the outer walk is effectively unbounded.
This contradicts the contract documented on `DocumentAverageRequest::limit` (`drive_document_average_query/mod.rs:121-145`): on the no-proof path, unset `limit` is supposed to fall back to `drive_config.default_query_limit` and explicit `limit > max_query_limit` is supposed to be clamped to `max_query_limit`. Count's dispatcher honoured this (`drive_document_count_query::drive_dispatcher.rs:496-507`) and was the side enforcing the clamp on the pre-#3690 AVG composition path; with the joint dispatcher there's no enforcement at all.
User-visible effect: `SELECT AVG(...) GROUP BY <range field> LIMIT n` now returns more than `n` rows, and broad GroupByRange/GroupByCompound queries over high-cardinality indexes will walk and serialise every matched terminator regardless of the operator's `max_query_limit`. Compute and propagate an `effective_limit = request.limit.unwrap_or(drive_config.default_query_limit).min(drive_config.max_query_limit)` into `RangeSumOptions::carrier_outer_limit` (and into `distinct_sum_path_query`) for the distinct shapes, matching count's behaviour.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:110-188: Flat-aggregate range branch replaces sum's O(log n) merk accumulator with an O(n) Rust fold over every PCPS terminator
For `AverageMode::Aggregate` + range (and the compound `GroupByIn` + range fold), the joint executor unconditionally routes through `distinct_sum_path_query(...)` + `grove_get_raw_path_query(...)` and folds `(count, sum)` for every visited element in Rust (lines 124-188).
Sum's pre-#3690 flat-aggregate path used `grove.query_aggregate_sum`, the merk-internal `AggregateSumOnRange` primitive that returns a single `i64` in O(log n) (`execute_range_sum.rs:137-148`). Count's flat-aggregate path used the analogous `query_aggregate_count`. The pre-#3690 AVG composition therefore did two O(log n) accumulator reads; this PR replaces them with one O(n) walk over every distinct in-range terminator.
The module docstring (lines 8-29) acknowledges this as a known tradeoff pending a combined `query_aggregate_count_and_sum` grovedb primitive, and the PR description's "halving the grovedb work per AVG query" claim is accurate for point-lookup / PerInValue / distinct shapes — but it is an asymptotic regression on flat-aggregate range over high-cardinality indexes that is reachable from an unauthenticated DAPI client. Until the grovedb-side primitive lands, consider keeping the flat-aggregate non-distinct range branch on the parallel accumulator composition (`query_aggregate_sum + query_aggregate_count` under a shared transaction) and reserving the joint walk for shapes that already require materialising distinct keys. At minimum, soften the PR description's perf claim and call this branch out explicitly in the dispatcher docstring so the regression is visible to operators.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:75-108: AVG no-prove path no longer runs `validate_and_canonicalize_where_clauses`
Pre-#3690 the AVG no-prove dispatcher composed `DocumentCountRequest` + `DocumentSumRequest`, and count's dispatcher (`drive_document_count_query::drive_dispatcher.rs:432`) ran `validate_and_canonicalize_where_clauses(request.where_clauses)?` which rejects malformed shapes (duplicate `Equal` on the same field, multiple `In`s, contradictory ranges via `group_clauses`) and canonicalises `[field > A, field < B]` pairs into a single `field between* [A, B]` clause. Sum's dispatcher never validated, so the count side was the validator on the AVG composition path.
The new `execute_document_count_and_sum_request` forwards `request.where_clauses` to the joint executors untouched. Malformed AVG queries that count previously rejected now reach the executors directly and either succeed with arbitrarily-reduced results (e.g., `find_summable_index_for_where_clauses` collapses duplicate same-field clauses via `BTreeSet`) or fail downstream with less precise errors. AVG queries that previously had `[> A, < B]` canonicalised into `between*` may now hit a different index-picker decision. Run the validator either inside the joint dispatcher entry point or one level up in `execute_document_average_request`, so the AVG surface keeps the accept/reject contract it had pre-#3690.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/total.rs`:191-223: Empty-where `read_primary_key_count_sum_tree` fast path is not covered by the new parity tests
The author explicitly notes inside `joint_total_executor_matches_independent_count_plus_sum` (`drive_dispatcher.rs:990-998`): "the empty-where branch requires documents_summable; if the doctype lacks it, the picker is invoked with empty where, which `find_summable_index_for_where_clauses` rejects ... So we test Equal-only-fully-covered instead." That means the `where_clauses.is_empty() && document_type.documents_countable() && documents_summable() == Some(sum_property)` fast path and the entire `read_primary_key_count_sum_tree` helper (which raises `CorruptedCodeExecution` if the primary-key element is missing) are exercised by zero tests in this PR.
This is the consensus-critical doctype-level fast path; a regression here would silently produce wrong `(count, sum)` aggregates for the most-trafficked AVG shape (unfiltered total). Extend the test fixture so a doctype with both `documents_summable` and `documents_countable` can be produced, then add a parity test directly against the empty-where Total branch.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:197-203: Drop predicate `count == 0 && sum == 0` does not actually match sum's distinct contract
The inline comment claims the conjunctive predicate "matches sum's distinct contract (which drops `sum == 0` rows)", but sum's distinct executor at `drive_document_sum_query/execute_range_sum.rs:187` drops solely on `sum == 0` regardless of count. The joint predicate `count == 0 && sum == 0` is strictly weaker: a row with `count > 0, sum == 0` (e.g., several documents whose property value is 0, or whose signed values cancel) is kept by the joint executor but would have been dropped by sum's standalone distinct dispatch.
The widening is arguably more correct for AVG semantics — `count > 0` is informative even when the sum is zero — but the comment as written is wrong. Either reword to acknowledge the intentional widening (sum drops on `sum == 0`; joint AVG drops only when the row is fully empty so as to preserve `count == N`/`sum == 0` rows), or tighten the predicate to match sum literally. None of the parity tests exercise a `sum == 0, count > 0` row, so the divergence is currently unobserved.
Inline posting hit GitHub HTTP 422, so I posted the same verified findings as a top-level review body.
…3690) Two reviewer findings on #3690: P1: The joint RangeNoProof executor walked every matched PCPS element via `distinct_sum_path_query` and folded `(count, sum)` in Rust for *all* shapes including aggregate. That turned `Aggregate + range` and `GroupByIn + range` into O(matching range keys) reads — a public DAPI request-amplification surface that the pre-#3687 dispatcher avoided by calling grovedb's merk-internal `query_aggregate_count` / `query_aggregate_sum` accumulators (each O(log n)). The fix restores the merk-internal accumulator calls for the aggregate shapes (one count call + one sum call per branch, under a shared read transaction for atomicity). The single-PCPS-walk win still applies on the distinct shapes (`GroupByRange + range`, `GroupByCompound + range`) where one walk yields both axes per visited element vs. two parallel walks zipped post-hoc — which was the original perf rationale in #3687. Compound `(In + range)` per-In fans out (bounded by `In::in_values()`'s 100-cap × 2 accumulator reads per branch), matching count and sum's existing executors. A future optional grovedb optimization could expose a combined no-proof `query_aggregate_count_and_sum` primitive collapsing the two calls into one merk-internal accumulator; #3687 captures the follow-up. P2: The joint dispatcher hard-coded `None` into the distinct walk's `distinct_sum_path_query` limit, ignoring `DocumentAverageRequest::limit` on the no-proof distinct branches. A request without `limit` now falls back to `drive_config.default_query_limit` and an explicit `limit > max_query_limit` clamps to `max_query_limit`, matching count's no-proof distinct policy and the documented contract on `DocumentAverageRequest::limit`. Three new regression tests pin the limit behavior: explicit limit, `None` defaults to `default_query_limit`, and over-max clamps to `max_query_limit`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed both findings in 0f6512e: P1: Replaced the unbounded PCPS raw-walk on the aggregate shapes with grovedb's bounded merk-internal accumulators. P2: The joint dispatcher now derives Aggregate branches deliberately drop Tests: |
|
✅ DashSDKFFI.xcframework built for this PR.
SwiftPM (host the zip at a stable URL, then use): .binaryTarget(
name: "DashSDKFFI",
url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
checksum: "84f3a83af503cd2af74d0143add52ed60b359dfa924d6fc5e1d5beba639c8100"
)Xcode manual integration:
|
CodeRabbit + thepastaclaw findings on #3690: - Joint dispatcher now rejects direct callers passing `prove = true` with `CorruptedCodeExecution` rather than silently producing a no-proof response. The wrapper `execute_document_average_request` is the only legitimate entry that routes prove requests (to the prove-side dispatcher); reaching the joint dispatcher with `prove = true` is a wiring bug. - Joint dispatcher now runs `validate_and_canonicalize_where_clauses` on the input, restoring the shape contract the pre-#3690 AVG composition path inherited from count's dispatcher (rejects malformed shapes like duplicate Equal on the same field; canonicalizes `[> A, < B]` pairs into `between*`). Without it, malformed AVG queries that count previously rejected reached the joint executors with arbitrarily-collapsed shapes. - `PerInValue` executor now honors `request.limit`, mirroring count's per-In policy. The bound is `min(request.limit, max_query_limit)`; unset `limit` leaves the natural In-cap (100) in place. Reviewer's finding was framed as a blocker on RangeNoProof + PerInValue together; RangeNoProof was already fixed in 0f6512e, this closes the remaining PerInValue half. - Reworded the drop-predicate comment in `range_no_proof.rs` to acknowledge the intentional divergence from sum's `sum == 0` predicate. The joint executor preserves `count = N, sum = 0` rows (informative for AVG: a group with N docs whose values sum to zero is meaningful) and only drops grovedb-absent `(0, 0)` rows. Four new regression tests pin: prove=true guard, validator activation, PerInValue limit truncation, and the empty-where primary-key count-sum-tree fast path (the consensus-critical doctype-level Total shape that no prior parity test covered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed @thepastaclaw findings in 5ed1f0d (the limit drop + flat-aggregate amplification were already addressed in 0f6512e before this review landed — the review was on [blocking] Limit drops — [suggestion] Flat-aggregate range O(n) fold — Fixed in 0f6512e: [suggestion] [suggestion] Empty-where Total fast path test — Added [nitpick] Drop predicate comment — Reworded the inline comment in Test counts: |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
All five findings converge across both reviewers (claude + codex) and are confirmed against the checked-out SHA e7be89d. The most recent delta (RangeNoProof shape pairing) is a defensive improvement but does not address any prior issue. One blocking limit-contract regression remains on the AVG no-prove dispatcher, plus four supporting findings around an O(n) Rust fold, bypassed where-clause validation, missing parity coverage of the consensus-critical empty-where fast path, and a misleading drop-predicate comment.
🔴 1 blocking | 🟡 3 suggestion(s) | 💬 1 nitpick(s)
5 additional finding(s)
blocking: AVG no-prove dispatcher drops request.limit on grouped range paths
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 119)
Both the PerInValue arm (lines 120-124) and the RangeNoProof arm (lines 145-149) build RangeSumOptions { carrier_outer_limit: None, .. } and never read request.limit or request.drive_config. The executor in range_no_proof.rs:124-128 then calls sum_query.distinct_sum_path_query(None::<u16>, ...), leaving the outer walk unbounded. This contradicts the documented DocumentAverageRequest::limit contract on the no-proof path (drive_document_average_query/mod.rs:121-145), which specifies that unset limit falls back to drive_config.default_query_limit and explicit limit > max_query_limit clamps to max_query_limit. Count's dispatcher honors this at drive_document_count_query/drive_dispatcher.rs:496-503; the joint AVG dispatcher has no equivalent. User-visible effect: SELECT AVG(...) GROUP BY <range field> LIMIT n returns more than n rows, and broad GroupByRange/GroupByCompound queries on high-cardinality rangeAverageable indexes walk and serialize every matched terminator regardless of the operator's max_query_limit. The path is reachable from unauthenticated DAPI traffic, making this both a correctness regression and a DoS amplification surface. Compute effective_limit = request.limit.unwrap_or(request.drive_config.default_query_limit as u32).min(request.drive_config.max_query_limit as u32) for the distinct shapes and propagate it into RangeSumOptions::carrier_outer_limit and the downstream distinct_sum_path_query call.
suggestion: Flat-aggregate range branch replaces O(log n) merk accumulator with O(n) Rust fold
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 110)
For AverageMode::Aggregate + range and the compound GroupByIn + range fold (!options.return_distinct_sums_in_range branch, lines 158-187), the executor routes through distinct_sum_path_query + grove_get_raw_path_query and folds (count, sum) over every visited PCPS terminator in Rust. Sum's pre-refactor flat-aggregate path used grove.query_aggregate_sum (the merk-internal AggregateSumOnRange primitive, O(log n) — returns a single i64), and count's analog used query_aggregate_count. The pre-refactor AVG composition therefore performed two O(log n) accumulator reads; this PR replaces them with one O(n) walk over every distinct in-range terminator. The module docstring (lines 8-29) acknowledges this as a known tradeoff pending a combined grovedb query_aggregate_count_and_sum primitive, but the asymptotic regression is real on broad range queries reachable from untrusted DAPI input — and the PR description's blanket 'halving the grovedb work per AVG query' claim does not hold on this branch. Until the grovedb-side primitive lands, consider keeping the flat-aggregate non-distinct range branch on the parallel query_aggregate_sum + query_aggregate_count composition under a shared transaction, and reserve the joint walk for shapes that already require materializing distinct keys.
suggestion: AVG no-prove path no longer runs validate_and_canonicalize_where_clauses
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 75)
execute_document_count_and_sum_request forwards request.where_clauses to detect_sum_mode_from_inputs and the joint executors verbatim (line 97, 101), and execute_document_average_request (drive_document_average_query/drive_dispatcher.rs:83-86) does no validation either. Pre-refactor the AVG no-prove path composed DocumentCountRequest + DocumentSumRequest, and count's dispatcher (drive_document_count_query/drive_dispatcher.rs:432) ran validate_and_canonicalize_where_clauses(request.where_clauses)?, which rejects malformed shapes (duplicate Equal on the same field, multiple Ins, contradictory ranges) and canonicalizes [field > A, field < B] pairs into a single field between* [A, B] clause. Sum never validated; count was the validator on the AVG composition path. As a result, malformed AVG queries that count previously rejected now reach the executors directly — find_summable_index_for_where_clauses collapses duplicate same-field clauses via BTreeSet so they may succeed with arbitrarily-reduced results, and previously-canonicalized range pairs may pick a different covering index. Run the validator at the entry of the joint dispatcher (or one level up in execute_document_average_request) to restore the pre-refactor accept/reject contract.
suggestion: Empty-where read_primary_key_count_sum_tree fast path has no parity test
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/total.rs (line 72)
The consensus-critical fast path triggered when where_clauses.is_empty() && document_type.documents_countable() && documents_summable() == Some(sum_property) reads the primary-key count-sum-bearing tree directly via read_primary_key_count_sum_tree (total.rs:185-217), which raises CorruptedCodeExecution when the element is missing. The new parity test joint_total_executor_matches_independent_count_plus_sum (drive_document_average_query/drive_dispatcher.rs:923-1008) explicitly acknowledges in comments at lines 990-998 that the fixture cannot produce a doctype with both documents_summable and documents_countable, so it substitutes an Equal-only-fully-covered case instead. That leaves the doctype-level empty-where fast path and read_primary_key_count_sum_tree untested. A regression here would silently mis-aggregate the most-trafficked AVG shape (unfiltered total). Extend the test fixture to produce a doctype with both flags and add a direct parity test against the empty-where Total branch.
nitpick: Drop predicate count == 0 && sum == 0 does not match sum's distinct contract
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 197)
The inline comment at lines 197-200 claims the conjunctive predicate 'matches sum's distinct contract (which drops sum == 0 rows)', but sum's distinct executor at drive_document_sum_query/execute_range_sum.rs:187 drops solely on sum == 0 regardless of count. The joint predicate count == 0 && sum == 0 is strictly weaker: a row with count > 0, sum == 0 (e.g., several documents whose property value is 0, or whose signed values cancel) is retained by the joint AVG executor but dropped by sum's standalone distinct dispatch. The widening is arguably correct for AVG semantics — count > 0 is informative even when sum is zero — but the comment as written is wrong. Either reword to acknowledge the intentional widening, or tighten the predicate to match sum literally. No parity test exercises a sum == 0, count > 0 row, so the divergence is currently unobserved.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [BLOCKING] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:119-158: AVG no-prove dispatcher drops request.limit on grouped range paths
Both the `PerInValue` arm (lines 120-124) and the `RangeNoProof` arm (lines 145-149) build `RangeSumOptions { carrier_outer_limit: None, .. }` and never read `request.limit` or `request.drive_config`. The executor in `range_no_proof.rs:124-128` then calls `sum_query.distinct_sum_path_query(None::<u16>, ...)`, leaving the outer walk unbounded. This contradicts the documented `DocumentAverageRequest::limit` contract on the no-proof path (`drive_document_average_query/mod.rs:121-145`), which specifies that unset `limit` falls back to `drive_config.default_query_limit` and explicit `limit > max_query_limit` clamps to `max_query_limit`. Count's dispatcher honors this at `drive_document_count_query/drive_dispatcher.rs:496-503`; the joint AVG dispatcher has no equivalent. User-visible effect: `SELECT AVG(...) GROUP BY <range field> LIMIT n` returns more than n rows, and broad GroupByRange/GroupByCompound queries on high-cardinality `rangeAverageable` indexes walk and serialize every matched terminator regardless of the operator's `max_query_limit`. The path is reachable from unauthenticated DAPI traffic, making this both a correctness regression and a DoS amplification surface. Compute `effective_limit = request.limit.unwrap_or(request.drive_config.default_query_limit as u32).min(request.drive_config.max_query_limit as u32)` for the distinct shapes and propagate it into `RangeSumOptions::carrier_outer_limit` and the downstream `distinct_sum_path_query` call.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:110-188: Flat-aggregate range branch replaces O(log n) merk accumulator with O(n) Rust fold
For `AverageMode::Aggregate` + range and the compound `GroupByIn + range` fold (`!options.return_distinct_sums_in_range` branch, lines 158-187), the executor routes through `distinct_sum_path_query` + `grove_get_raw_path_query` and folds `(count, sum)` over every visited PCPS terminator in Rust. Sum's pre-refactor flat-aggregate path used `grove.query_aggregate_sum` (the merk-internal `AggregateSumOnRange` primitive, O(log n) — returns a single i64), and count's analog used `query_aggregate_count`. The pre-refactor AVG composition therefore performed two O(log n) accumulator reads; this PR replaces them with one O(n) walk over every distinct in-range terminator. The module docstring (lines 8-29) acknowledges this as a known tradeoff pending a combined grovedb `query_aggregate_count_and_sum` primitive, but the asymptotic regression is real on broad range queries reachable from untrusted DAPI input — and the PR description's blanket 'halving the grovedb work per AVG query' claim does not hold on this branch. Until the grovedb-side primitive lands, consider keeping the flat-aggregate non-distinct range branch on the parallel `query_aggregate_sum + query_aggregate_count` composition under a shared transaction, and reserve the joint walk for shapes that already require materializing distinct keys.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:75-108: AVG no-prove path no longer runs validate_and_canonicalize_where_clauses
`execute_document_count_and_sum_request` forwards `request.where_clauses` to `detect_sum_mode_from_inputs` and the joint executors verbatim (line 97, 101), and `execute_document_average_request` (drive_document_average_query/drive_dispatcher.rs:83-86) does no validation either. Pre-refactor the AVG no-prove path composed `DocumentCountRequest + DocumentSumRequest`, and count's dispatcher (drive_document_count_query/drive_dispatcher.rs:432) ran `validate_and_canonicalize_where_clauses(request.where_clauses)?`, which rejects malformed shapes (duplicate `Equal` on the same field, multiple `In`s, contradictory ranges) and canonicalizes `[field > A, field < B]` pairs into a single `field between* [A, B]` clause. Sum never validated; count was the validator on the AVG composition path. As a result, malformed AVG queries that count previously rejected now reach the executors directly — `find_summable_index_for_where_clauses` collapses duplicate same-field clauses via `BTreeSet` so they may succeed with arbitrarily-reduced results, and previously-canonicalized range pairs may pick a different covering index. Run the validator at the entry of the joint dispatcher (or one level up in `execute_document_average_request`) to restore the pre-refactor accept/reject contract.
- [SUGGESTION] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/total.rs`:72-86: Empty-where read_primary_key_count_sum_tree fast path has no parity test
The consensus-critical fast path triggered when `where_clauses.is_empty() && document_type.documents_countable() && documents_summable() == Some(sum_property)` reads the primary-key count-sum-bearing tree directly via `read_primary_key_count_sum_tree` (total.rs:185-217), which raises `CorruptedCodeExecution` when the element is missing. The new parity test `joint_total_executor_matches_independent_count_plus_sum` (drive_document_average_query/drive_dispatcher.rs:923-1008) explicitly acknowledges in comments at lines 990-998 that the fixture cannot produce a doctype with both `documents_summable` and `documents_countable`, so it substitutes an Equal-only-fully-covered case instead. That leaves the doctype-level empty-where fast path and `read_primary_key_count_sum_tree` untested. A regression here would silently mis-aggregate the most-trafficked AVG shape (unfiltered total). Extend the test fixture to produce a doctype with both flags and add a direct parity test against the empty-where Total branch.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:197-203: Drop predicate count == 0 && sum == 0 does not match sum's distinct contract
The inline comment at lines 197-200 claims the conjunctive predicate 'matches sum's distinct contract (which drops sum == 0 rows)', but sum's distinct executor at `drive_document_sum_query/execute_range_sum.rs:187` drops solely on `sum == 0` regardless of count. The joint predicate `count == 0 && sum == 0` is strictly weaker: a row with `count > 0, sum == 0` (e.g., several documents whose property value is 0, or whose signed values cancel) is retained by the joint AVG executor but dropped by sum's standalone distinct dispatch. The widening is arguably correct for AVG semantics — `count > 0` is informative even when sum is zero — but the comment as written is wrong. Either reword to acknowledge the intentional widening, or tighten the predicate to match sum literally. No parity test exercises a `sum == 0, count > 0` row, so the divergence is currently unobserved.
Inline posting hit GitHub HTTP 422, so I posted the same verified findings as a top-level review body.
|
@thepastaclaw the review header pinned [blocking] Limit drops on RangeNoProof + PerInValue — Both arms now honor the documented contract.
[suggestion] Flat-aggregate range O(n) Rust fold — Fixed in 0f6512ee63. The [suggestion] validate_and_canonicalize_where_clauses — Fixed in 5ed1f0deea. [suggestion] Empty-where fast path test — Fixed in 5ed1f0deea. Test [nitpick] Drop predicate comment — Fixed in 5ed1f0deea. Test counts at HEAD: |
thepastaclaw
left a comment
There was a problem hiding this comment.
Re-ran the review against current HEAD (5ed1f0d). I verified the five findings from my stale review are addressed now:\n\n- distinct RangeNoProof limit derives default/clamp and is passed into distinct_sum_path_query\n- PerInValue applies the caller limit after ordering, clamped to max_query_limit\n- aggregate range uses query_aggregate_count + query_aggregate_sum instead of a PCPS fold\n- validate_and_canonicalize_where_clauses runs before mode detection\n- empty-where count+sum fast path coverage and the drop-predicate comment are updated\n\nValidation: cargo test -p drive --lib drive_document_average_query (15/15 passed). Thanks for the detailed receipts — my prior review was pinned to e7be89d and is stale at 5ed1f0d.
…rent prose The What's Next subsection on the joint dispatcher was written as a post-#3687 migration note ("used to compose... as of #3687... halving the grovedb work"), which doesn't carry meaning for readers arriving after the merge. Two issues besides the tense: - The "single grovedb walk" / "halving the per-query grovedb work" framing applies only to the distinct shapes; the aggregate-range branches use grovedb's bounded merk-internal `query_aggregate_count` + `query_aggregate_sum` accumulators (two O(log n) reads), not a single PCPS walk. The original wording overstated the win. - The references to #3687 and the historical pre-refactor composition belong in PR descriptions / commit messages, not in the book chapter that documents the current architecture. Rewrite as present-tense prose describing the three per-mode shapes (Total/PerInValue point-lookup, RangeNoProof distinct PCPS walk, RangeNoProof aggregate merk accumulators) and the residual grovedb opportunity (a combined no-prove `query_aggregate_count_and_sum` primitive) without referencing the migration history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The module docstring at `drive_document_count_and_sum_query/mod.rs` and the dispatcher's docstring at `drive_dispatcher.rs` still described the pre-fix behavior — "the range-no-proof flat summed branches walk PCPS elements via grove_get_raw_path_query and fold in Rust" and "one grovedb walk per AVG no-prove query (halving the work)". Both claims were superseded by the P1 fix in 0f6512e: the aggregate range branches now use `query_aggregate_count` + `query_aggregate_sum` merk-internal accumulators, and only the distinct shapes do a single PCPS walk. Rewrite both docstrings as present-tense descriptions of the per-shape execution contract (point-lookup walk for Total/PerInValue, single PCPS walk for distinct range, two bounded accumulator reads for aggregate range — with compound In+range per-In fan-out). Drop the migration-flavored "halves the work" framing; it overstated the win on aggregate range and was a changelog snippet that didn't carry meaning post-merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
book/src/drive/average-index-examples.md (2)
1711-1711: ⚡ Quick winClarify or define the
rangeAverageableterm.Line 1711 uses "
rangeAverageable" to describe an index, but this property name doesn't appear in the grades contract definition (lines 54-61 showrangeCountable: trueandrangeSummable: trueas separate properties). Readers may search for "rangeAverageable" and fail to find it in the actual schema.Suggest either:
- Define "rangeAverageable" explicitly (e.g., "an index with both
rangeCountable: trueANDrangeSummable: true"), or- Use the explicit phrasing: "range on a
rangeCountable + rangeSummableindex" to match terminology used elsewhere in the doc📝 Suggested clarification
-- **`RangeNoProof` distinct shapes** (`GroupByRange` / `GroupByCompound` + range on a `rangeAverageable` index) walk `ProvableCountProvableSumTree` terminators once via the same `distinct_sum_path_query` builder the sum surface uses, emitting one `(count, sum)` per distinct in-range key — strictly better than the count + sum surfaces' parallel walks because both metrics come from each visited element. +- **`RangeNoProof` distinct shapes** (`GroupByRange` / `GroupByCompound` + range on an index with both `rangeCountable: true` and `rangeSummable: true`) walk `ProvableCountProvableSumTree` terminators once via the same `distinct_sum_path_query` builder the sum surface uses, emitting one `(count, sum)` per distinct in-range key — strictly better than the count + sum surfaces' parallel walks because both metrics come from each visited element.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@book/src/drive/average-index-examples.md` at line 1711, The term "rangeAverageable" is not defined in the contract and may confuse readers; replace or define it so it matches the schema: either explicitly state that "rangeAverageable" means an index with both rangeCountable: true AND rangeSummable: true, or change the phrase on line 1711 to "range on a rangeCountable + rangeSummable index" (or similar explicit wording) so it matches the existing properties rangeCountable and rangeSummable used elsewhere.
1710-1710: 💤 Low valueMinor: Inconsistent backtick usage for query clause names.
Line 1710 uses backticks for
`In`but not forEqualin the same phrase: "no-range Equal/In". For consistency, both query clause names should have the same formatting.✨ Suggested style fix
-- **`Total` / `PerInValue`** (no-range Equal/`In` on a `summable + countable` index) walks the point-lookup path query and decodes `(count, sum)` from each visited `CountSumTree` terminator in one call via `Element::count_sum_value_or_default()`. One grovedb call per `In` branch, both metrics together. +- **`Total` / `PerInValue`** (no-range `Equal`/`In` on a `summable + countable` index) walks the point-lookup path query and decodes `(count, sum)` from each visited `CountSumTree` terminator in one call via `Element::count_sum_value_or_default()`. One grovedb call per `In` branch, both metrics together.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@book/src/drive/average-index-examples.md` at line 1710, The phrase "no-range Equal/`In`" uses inconsistent backtick formatting for query clause names; change it so both clauses match (e.g., backtick both as `` `Equal`/`In` `` or remove backticks for both as "Equal/In") in the "Total / PerInValue" description that references `CountSumTree` and Element::count_sum_value_or_default(), ensuring consistent styling for query clause names.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@book/src/drive/average-index-examples.md`:
- Line 1711: The term "rangeAverageable" is not defined in the contract and may
confuse readers; replace or define it so it matches the schema: either
explicitly state that "rangeAverageable" means an index with both
rangeCountable: true AND rangeSummable: true, or change the phrase on line 1711
to "range on a rangeCountable + rangeSummable index" (or similar explicit
wording) so it matches the existing properties rangeCountable and rangeSummable
used elsewhere.
- Line 1710: The phrase "no-range Equal/`In`" uses inconsistent backtick
formatting for query clause names; change it so both clauses match (e.g.,
backtick both as `` `Equal`/`In` `` or remove backticks for both as "Equal/In")
in the "Total / PerInValue" description that references `CountSumTree` and
Element::count_sum_value_or_default(), ensuring consistent styling for query
clause names.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78dc92a7-dfeb-4a58-97f8-4b9416fd7528
📒 Files selected for processing (1)
book/src/drive/average-index-examples.md
- `rangeAverageable` is real DPP syntactic sugar but never defined inline in this chapter (the grades-contract example uses the longhand `rangeCountable: true + rangeSummable: true`). Spell out the index requirement explicitly and note `rangeAverageable: true` as the shorthand form, so readers searching the chapter for either spelling find a match. - Backtick `Equal` for parity with `` `In` `` in the same phrase.
|
Addressed both nits in a4c70da9cd:
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Re-checked current HEAD a4c70da after the docs-only nit fixes. The delta addresses both CodeRabbit comments: Equal/In formatting is consistent, and the range index wording now spells out rangeCountable: true + rangeSummable: true while noting rangeAverageable: true shorthand. No further concerns from me.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3690 +/- ##
============================================
+ Coverage 87.02% 87.17% +0.14%
============================================
Files 2596 2600 +4
Lines 316510 317961 +1451
============================================
+ Hits 275447 277173 +1726
+ Misses 41063 40788 -275
🚀 New features to boost your workflow:
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta at 3a1cea1 is documentation-only (book/src/drive/average-index-examples.md). All three prior nitpicks against drive_dispatcher.rs remain valid in the joint count+sum AVG no-proof dispatcher: PerInValue arm contract drift, unreachable u16::try_from, and unused carrier_outer_limit field. No new security/blocking issues.
💬 3 nitpick(s)
3 additional finding(s)
nitpick: PerInValue arm leaves unset `limit` as `None`, drifting from `DocumentAverageRequest::limit`'s documented no-proof contract
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 166)
DocumentAverageRequest::limit (drive_document_average_query/mod.rs:121-130) documents the no-proof contract as: unset limit falls back to drive_config.default_query_limit; explicit limit > max_query_limit clamps. The RangeNoProof arm (lines 205-214) honors that, but the PerInValue arm computes request.limit.map(|l| l.min(max_query_limit as u32)), leaving per_in_limit = None when the caller didn't set one. The inline comment at lines 157-165 also claims the policy mirrors count's PerInValue path, but count's dispatcher (drive_document_count_query/drive_dispatcher.rs:476) forces Some(MAX_LIMIT_AS_FAILSAFE) instead. The In-clause's 100-cap bounds the path today, so user-visible impact is minor, but the codepath matches neither the documented average contract nor its sibling executor. Either fall back to drive_config.default_query_limit for contract parity, or apply MAX_LIMIT_AS_FAILSAFE for count parity; also update the comment to match.
nitpick: `u16::try_from(effective_limit)` is unreachable — value is already bounded by `max_query_limit: u16`
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 215)
effective_limit is request.limit.unwrap_or(default_query_limit as u32).min(max_query_limit as u32) (lines 205-211). Both default_query_limit and max_query_limit are u16 in DriveConfig (config.rs:52, 72), so the result is always <= u16::MAX regardless of caller input. The u16::try_from(l).map_err(...) block and its QuerySyntaxError::Unsupported branch (lines 217-224) cannot fire. Collapse to effective_limit.map(|l| l as u16) to remove the dead error path from a consensus-adjacent dispatcher.
let limit_u16 = effective_limit.map(|l| l as u16);
nitpick: `RangeSumOptions::carrier_outer_limit` set to `None` in both arms but never read by the joint executors
packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs (line 152)
Both the PerInValue arm (line 154) and the RangeNoProof arm (line 192) construct RangeSumOptions { carrier_outer_limit: None, .. }, but a workspace grep finds zero read sites for this field — only assignments in the joint and sum dispatchers and the single struct definition at drive_document_sum_query/mod.rs:275. The joint executors use only options.left_to_right and options.return_distinct_sums_in_range; the real limit channel is the explicit limit_u16 argument passed into execute_document_count_and_sum_range_no_proof. Carrying an always-None field at the joint dispatch site is misleading — a future maintainer could reasonably assume that field bounds the walk. Either consume carrier_outer_limit in the joint executors (folding the limit argument into it) or use a joint-specific options struct exposing only the fields the joint executors actually read.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:166-168: PerInValue arm leaves unset `limit` as `None`, drifting from `DocumentAverageRequest::limit`'s documented no-proof contract
`DocumentAverageRequest::limit` (drive_document_average_query/mod.rs:121-130) documents the no-proof contract as: unset limit falls back to `drive_config.default_query_limit`; explicit limit > `max_query_limit` clamps. The RangeNoProof arm (lines 205-214) honors that, but the PerInValue arm computes `request.limit.map(|l| l.min(max_query_limit as u32))`, leaving `per_in_limit = None` when the caller didn't set one. The inline comment at lines 157-165 also claims the policy mirrors count's PerInValue path, but count's dispatcher (drive_document_count_query/drive_dispatcher.rs:476) forces `Some(MAX_LIMIT_AS_FAILSAFE)` instead. The In-clause's 100-cap bounds the path today, so user-visible impact is minor, but the codepath matches neither the documented average contract nor its sibling executor. Either fall back to `drive_config.default_query_limit` for contract parity, or apply `MAX_LIMIT_AS_FAILSAFE` for count parity; also update the comment to match.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:215-226: `u16::try_from(effective_limit)` is unreachable — value is already bounded by `max_query_limit: u16`
`effective_limit` is `request.limit.unwrap_or(default_query_limit as u32).min(max_query_limit as u32)` (lines 205-211). Both `default_query_limit` and `max_query_limit` are `u16` in `DriveConfig` (config.rs:52, 72), so the result is always `<= u16::MAX` regardless of caller input. The `u16::try_from(l).map_err(...)` block and its `QuerySyntaxError::Unsupported` branch (lines 217-224) cannot fire. Collapse to `effective_limit.map(|l| l as u16)` to remove the dead error path from a consensus-adjacent dispatcher.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rs`:152-194: `RangeSumOptions::carrier_outer_limit` set to `None` in both arms but never read by the joint executors
Both the PerInValue arm (line 154) and the RangeNoProof arm (line 192) construct `RangeSumOptions { carrier_outer_limit: None, .. }`, but a workspace grep finds zero read sites for this field — only assignments in the joint and sum dispatchers and the single struct definition at drive_document_sum_query/mod.rs:275. The joint executors use only `options.left_to_right` and `options.return_distinct_sums_in_range`; the real limit channel is the explicit `limit_u16` argument passed into `execute_document_count_and_sum_range_no_proof`. Carrying an always-None field at the joint dispatch site is misleading — a future maintainer could reasonably assume that field bounds the walk. Either consume `carrier_outer_limit` in the joint executors (folding the `limit` argument into it) or use a joint-specific options struct exposing only the fields the joint executors actually read.
Inline posting hit GitHub HTTP 422, so I posted the same verified findings as a top-level review body.
Three thepastaclaw nits on the joint count+sum dispatcher.
PerInValue limit fallback. The arm computed
`request.limit.map(|l| l.min(max_query_limit))`, leaving the limit
as `None` when the caller didn't supply one — neither matching
`DocumentAverageRequest::limit`'s documented no-proof contract
(unset → `default_query_limit`) nor count's PerInValue policy
(hard-coded `MAX_LIMIT_AS_FAILSAFE`). Honor the AVG contract: fall
back to `default_query_limit`, clamp to `max_query_limit`. Updated
the inline comment to match. New regression test
`per_in_value_avg_no_proof_defaults_limit_to_operator_default_query_limit`
pins the no-limit-set case.
Unreachable u16::try_from. The RangeNoProof arm computed
`effective_limit.unwrap_or(default_query_limit as u32).min(max_query_limit as u32)`
where both `default_query_limit` and `max_query_limit` are u16 in
`DriveConfig`, so the result is always ≤ u16::MAX. The `u16::try_from`
+ `QuerySyntaxError::Unsupported` branch could not fire — dead code
in a consensus-adjacent dispatcher. Collapse to `as u16`.
Unused `carrier_outer_limit` field. Both arms constructed
`RangeSumOptions { carrier_outer_limit: None, .. }` but the joint
executors only read `left_to_right` and (for RangeNoProof)
`return_distinct_sums_in_range`. Carrying an always-None field was
misleading — a future maintainer could reasonably assume it bounded
the walk. Drop `RangeSumOptions` from the joint executor signatures
and pass the two booleans directly; `limit` continues to flow as an
explicit argument.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all three nits in 8dbe4a156e: PerInValue limit fallback — Now follows Unreachable u16::try_from — Collapsed to Unused
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
All six reviewers (claude × 3, codex × 3) independently re-validated PR #3690 at 8dbe4a1 and converged: the three prior nitpicks against 3a1cea1 are FIXED in this delta, and no new findings were raised. I spot-verified the dispatcher: line 165-168 applies the documented default/clamp limit for PerInValue, lines 202-208 collapse the unreachable u16::try_from to a direct safe cast after clamping, and the RangeSumOptions import at line 42 is gone (only DocumentSumMode, SumMode remain). The PerInValue regression test pinning the unset-limit fallback is in place.
thepastaclaw
left a comment
There was a problem hiding this comment.
Re-checked current HEAD 8dbe4a1 after the three nit fixes. The prior nitpicks are addressed: PerInValue now uses the documented default/clamp limit contract, the bounded RangeNoProof limit cast is a direct safe as u16, and the joint executor signatures no longer carry the unused RangeSumOptions::carrier_outer_limit field.
Validation: cargo test -p drive --lib drive_document_average_query (16/16 passed). No further concerns from me.
…call Bumps the grovedb pin to 9db515ef (head of dashpay/grovedb#676), which adds the combined no-prove `query_aggregate_count_and_sum` accumulator — a merk-internal `(u128, i128)` walk narrowed to `(u64, i64)` at the entry point, mirroring the proof-side `AggregateCountAndSumOnRange` shape. Rewires `flat_aggregate_count_and_sum` in the joint count+sum executor to issue one combined accumulator call against `aggregate_count_and_sum_path_query` instead of two parallel `query_aggregate_count` + `query_aggregate_sum` calls. Same cost class (O(log n) at the merk layer either way) but halves the round-trip count and removes the need for the shared-tx-across-two-reads plumbing on the flat branch. Compound `In + range` per-In fan-out still issues one accumulator call per branch under a shared read transaction — now one call per branch instead of two. The integration lands ahead of the grovedb PR's review so the platform side can exercise the primitive end-to-end. If the grovedb PR's review surfaces changes, the platform pin re-bumps trivially. Updated docstrings on `drive_document_count_and_sum_query/mod.rs`, `drive_dispatcher.rs`, `executors/range_no_proof.rs`, and the `book/src/drive/average-index-examples.md` chapter section to reflect the single-call aggregate branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After bumping grovedb to a rev that includes the combined `query_aggregate_count_and_sum` primitive (dcc3824), several docstrings and comments still described the old two-call shape or carried migration-flavored references ("pre-#3687 implementation", "pre-#3690 composition path", "halving the per-query work", "Two calls per branch", "combined no-proof primitive doesn't exist yet"). Sweep them out. Each rewritten as present-tense architecture prose: the aggregate range branch issues one accumulator call per branch; compound In + range per-In fans out to one call per branch under a shared read tx; the joint dispatcher reuses sum's routing table; the top-level AVG dispatcher splits prove/no-prove. No code changes — docs/comments only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Verified both agents' findings against the worktree at dcc3824. Two convergent nitpicks: doc/comment drift in range_no_proof.rs that still claims the aggregate-range branch issues two grovedb calls and references the absence of a combined no-proof primitive, contradicting the new single query_aggregate_count_and_sum call site at lines 354-360 and the surrounding refreshed module/dispatcher/book docs. Pure documentation drift, no functional or consensus impact.
💬 2 nitpick(s)
2 additional finding(s)
nitpick: Stale inline comment: aggregate branch is now one combined accumulator call, not two
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 103)
This comment introducing the !return_distinct branch still says Two calls because the combined no-proof primitive doesn't exist yet, but each is O(log n). After the grovedb rev bump to 9db515ef and the rewrite of flat_aggregate_count_and_sum (lines 334-361), the flat aggregate branch issues exactly one query_aggregate_count_and_sum call, and each per-In branch in the compound shape (lines 287-297) is also a single combined call. The surrounding module docstring (lines 1-36), the dispatcher docs at drive_dispatcher.rs, and the book chapter were all refreshed to describe the combined primitive — this block was missed and now actively contradicts the helper's own docstring (lines 326-332). Future readers will trust the nearer inline comment over the docstring further away.
// Aggregate shape: use grovedb's combined merk-internal
// `query_aggregate_count_and_sum` accumulator to fold
// `(count, sum)` in one O(log n) walk — strictly bounded
// regardless of how many documents the range matches. Same
// cost class as the pre-#3687 composed count + sum
// dispatchers, with one fewer merk-internal read per branch.
nitpick: Stale docstring on `aggregate_range_count_and_sum`: two-call shape and two-sub-dispatch atomicity no longer apply
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 191)
The docstring states Two calls per branch (count + sum); compound In + range per-In fans out and sums per-branch totals and the atomicity note says the count and sum reads see the same grovedb snapshot — same atomicity contract the pre-#3687 dispatcher implemented across its two sub-dispatches. After the delta, each per-In branch (and the flat path) performs ONE combined query_aggregate_count_and_sum call via flat_aggregate_count_and_sum. The shared local_tx (lines 219-225) is still meaningful — it ensures all per-In branches in the compound shape see one snapshot — but the framing in terms of pairing a count read with a sum read within a single branch is stale. Reword to match what the code actually does while preserving the (still-valid) per-In atomicity rationale.
/// Aggregate-range branch: returns `(count, sum)` via grovedb's
/// combined `query_aggregate_count_and_sum` merk-internal
/// accumulator. One call per branch; compound `In + range` per-In
/// fans out and sums per-branch totals.
///
/// All branches share a read transaction (opened internally when
/// the caller didn't supply one) so the per-In branches in the
/// compound shape see one snapshot — preserves the atomicity
/// contract the pre-#3687 dispatcher implemented across its two
/// sub-dispatches even though each branch is now a single read.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:103-108: Stale inline comment: aggregate branch is now one combined accumulator call, not two
This comment introducing the `!return_distinct` branch still says `Two calls because the combined no-proof primitive doesn't exist yet, but each is O(log n)`. After the grovedb rev bump to 9db515ef and the rewrite of `flat_aggregate_count_and_sum` (lines 334-361), the flat aggregate branch issues exactly one `query_aggregate_count_and_sum` call, and each per-In branch in the compound shape (lines 287-297) is also a single combined call. The surrounding module docstring (lines 1-36), the dispatcher docs at drive_dispatcher.rs, and the book chapter were all refreshed to describe the combined primitive — this block was missed and now actively contradicts the helper's own docstring (lines 326-332). Future readers will trust the nearer inline comment over the docstring further away.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:191-199: Stale docstring on `aggregate_range_count_and_sum`: two-call shape and two-sub-dispatch atomicity no longer apply
The docstring states `Two calls per branch (count + sum); compound In + range per-In fans out and sums per-branch totals` and the atomicity note says `the count and sum reads see the same grovedb snapshot — same atomicity contract the pre-#3687 dispatcher implemented across its two sub-dispatches`. After the delta, each per-In branch (and the flat path) performs ONE combined `query_aggregate_count_and_sum` call via `flat_aggregate_count_and_sum`. The shared `local_tx` (lines 219-225) is still meaningful — it ensures all per-In branches in the compound shape see one snapshot — but the framing in terms of pairing a count read with a sum read within a single branch is stale. Reword to match what the code actually does while preserving the (still-valid) per-In atomicity rationale.
Inline posting hit GitHub HTTP 422, so I posted the same verified findings as a top-level review body.
|
@thepastaclaw review pinned Inline comment around line 103 — current state at if !return_distinct {
// Aggregate shape: one combined merk-internal accumulator
// call (`query_aggregate_count_and_sum`) yielding
// `(u64, i64)` in O(log n) — strictly bounded regardless
// of how many documents the range matches. Compound
// `In + range` per-In fans out to one accumulator call
// per branch under a shared read transaction (see
// `aggregate_range_count_and_sum` below).
return self.aggregate_range_count_and_sum(
/// Aggregate-range branch: returns `(count, sum)` via grovedb's
/// combined `query_aggregate_count_and_sum` accumulator. One call
/// for the flat shape; compound `In + range` per-In fans out and
/// sums per-branch totals (one accumulator call per branch). All
/// branches share a read transaction (opened internally when the
/// caller didn't supply one) so per-In sub-reads see the same
/// grovedb snapshot.
Both pass the grep |
thepastaclaw
left a comment
There was a problem hiding this comment.
Rechecked current HEAD 866b74f501 after the follow-up docs sweep. The two stale-comment/docstring findings from the dcc3824d review are fixed: the aggregate branch comment and aggregate_range_count_and_sum docs now describe the single query_aggregate_count_and_sum call / per-In fan-out accurately, and the stale "two calls" / "pre-#3687" wording grep is clean across the joint AVG/count+sum docs.
Validation: cargo test -p drive --lib drive_document_average_query → 16/16 passed.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Latest delta (dcc3824..866b74f) is documentation-only and resolves both prior nitpicks about stale 'two calls / pre-#3687' framing in range_no_proof.rs. One residual same-class drift remains: the inline comment introducing the shared read transaction at lines 214-217 still describes it as bridging 'the count and sum accumulator calls', even though each branch now issues a single combined accumulator call.
💬 1 nitpick(s)
1 additional finding(s)
nitpick: Residual stale comment: shared-tx rationale still references paired count+sum calls
packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs (line 214)
The surrounding module-level rationale (lines 103-109) and the helper's docstring (lines 192-198) were refreshed in this delta to describe the combined query_aggregate_count_and_sum primitive — one accumulator call per branch. This inner comment was missed: it still reads Open a shared read transaction across the count and sum accumulator calls (and across per-In branches in the compound shape) so they see one snapshot. With the combined primitive, there are no longer paired count/sum calls to bridge. The flat path issues a single read (atomic without a transaction); only the per-In compound fan-out actually benefits from the shared snapshot. A future reader will trust the inner comment over the just-refreshed docstring two lines above.
// Open a shared read transaction across per-In branches in
// the compound shape so each branch's accumulator call sees
// the same grovedb snapshot. The flat path issues a single
// read and gets atomicity for free; the transaction is
// harmless there. Read-only; dropped without commit at scope
// end.
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
- [NITPICK] In `packages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rs`:214-217: Residual stale comment: shared-tx rationale still references paired count+sum calls
The surrounding module-level rationale (lines 103-109) and the helper's docstring (lines 192-198) were refreshed in this delta to describe the combined `query_aggregate_count_and_sum` primitive — one accumulator call per branch. This inner comment was missed: it still reads `Open a shared read transaction across the count and sum accumulator calls (and across per-In branches in the compound shape) so they see one snapshot`. With the combined primitive, there are no longer paired count/sum calls to bridge. The flat path issues a single read (atomic without a transaction); only the per-In compound fan-out actually benefits from the shared snapshot. A future reader will trust the inner comment over the just-refreshed docstring two lines above.
Inline posting failed; posted the verified findings in the top-level review body.
Caught in a thepastaclaw sweep. The shared-read-transaction rationale inside aggregate_range_count_and_sum still framed the snapshot as bridging "the count and sum accumulator calls" — accurate before the grovedb pin bump to 9db515ef, stale after. Each per-In branch now issues a single combined `query_aggregate_count_and_sum` call, so the shared tx only matters for atomicity across per-In branches in the compound shape; the flat path gets atomicity for free from its single read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/rs-drive/Cargo.toml (1)
55-60: ⚡ Quick winConsider centralizing GroveDB git revision in
[workspace.dependencies].All 12 GroveDB dependencies across 6 crates use the identical revision (
9db515efe59602e536eb75e79308e0cf8f8474be). Moving base definitions to the workspace-level[workspace.dependencies]section—which already exists and follows this pattern for dashcore dependencies—would eliminate duplication and reduce drift risk during future version bumps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/rs-drive/Cargo.toml` around lines 55 - 60, Multiple grovedb entries (grovedb, grovedb-costs, grovedb-path, grovedb-storage, grovedb-version, grovedb-epoch-based-storage-flags) repeat the same git rev; move their shared git/rev definition to the workspace-level [workspace.dependencies] and make the per-crate entries refer to that centralized dependency to avoid duplication and drift—update the workspace's Cargo.toml by adding each grovedb dependency with the git + rev there (matching the current rev 9db515efe59602e536eb75e79308e0cf8f8474be) and remove the git/rev fields from the individual package entries so they inherit the workspace definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/rs-drive/Cargo.toml`:
- Around line 55-60: Multiple grovedb entries (grovedb, grovedb-costs,
grovedb-path, grovedb-storage, grovedb-version,
grovedb-epoch-based-storage-flags) repeat the same git rev; move their shared
git/rev definition to the workspace-level [workspace.dependencies] and make the
per-crate entries refer to that centralized dependency to avoid duplication and
drift—update the workspace's Cargo.toml by adding each grovedb dependency with
the git + rev there (matching the current rev
9db515efe59602e536eb75e79308e0cf8f8474be) and remove the git/rev fields from the
individual package entries so they inherit the workspace definitions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 33fac876-23b4-480f-85e6-e857b6c72397
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
book/src/drive/average-index-examples.mdpackages/rs-dpp/Cargo.tomlpackages/rs-drive-abci/Cargo.tomlpackages/rs-drive/Cargo.tomlpackages/rs-drive/src/query/drive_document_average_query/drive_dispatcher.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/drive_dispatcher.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/per_in_value.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/executors/range_no_proof.rspackages/rs-drive/src/query/drive_document_count_and_sum_query/mod.rspackages/rs-platform-version/Cargo.tomlpackages/rs-platform-wallet/Cargo.tomlpackages/rs-sdk/Cargo.toml
✅ Files skipped from review due to trivial changes (1)
- packages/rs-platform-version/Cargo.toml
|
Deferring the workspace-deps centralization to a separate PR. The duplication predates this PR — grovedb refs appear in 6 packages with ~26 entries total (rs-drive 14, rs-drive-abci 5, rs-dpp 2, rs-sdk 2, rs-platform-wallet 2, rs-platform-version 1). This PR's grovedb pin bump just changed the rev on existing entries; it didn't add new duplication. Centralizing them in The bump-cost concern is real but mitigated for this PR: the rev bump was a single |
thepastaclaw
left a comment
There was a problem hiding this comment.
Re-checked current HEAD 32602ee after the follow-up comment cleanup. The residual stale shared-tx rationale from the 866b74f review is fixed: it now correctly says the shared transaction matters for per-In branch snapshot consistency, while the flat path is a single combined accumulator read and gets atomicity for free. Latest delta is comment-only; no further concerns from me on that change.
dashpay/grovedb#676 (the no-prove `query_aggregate_count_and_sum` primitive) was squash-merged into develop as commit 60f29685. The content is identical to the PR-branch head 9db515ef that the previous commit pinned; just moves to the canonical develop ancestry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Re-checked current HEAD 8b829e9 after the Grovedb pin bump. The delta from the last approval is limited to moving all existing Grovedb refs from 9db515e to 60f2968 plus the lockfile refresh; the AVG/count+sum code itself is unchanged.
I agree with deferring the workspace-dependencies centralization: the duplicated Grovedb refs are pre-existing workspace-wide debt, not introduced by this PR, and moving them changes dependency ownership across unrelated crates. Better as a focused cleanup PR than scope creep here.
Validation: cargo test -p drive --lib drive_document_average_query passes (16/16).
Issue being fixed or feature implemented
Closes #3687.
The AVG dispatcher's no-prove path previously split a single
DocumentAverageRequestinto two sub-requests (DocumentCountRequest+DocumentSumRequest), dispatched each through its per-surface dispatcher, then zipped the responses. The shape was correct (a shared read transaction guaranteed atomicity), but every AVG no-prove query did twice the grovedb work strictly necessary, and required count's and sum's routing tables to stay in lock-step for AVG composition to work. PR #3661 caught one drift bug between those tables. The prove path was already correct (one PCPS walk viaAggregateCountAndSumOnRange).What was done?
Added
Drive::execute_document_count_and_sum_requestin a new modulepackages/rs-drive/src/query/drive_document_count_and_sum_query/. The new dispatcher:DocumentAverageRequestand returns the existingDocumentAverageResponse(no new wire types — issue explicitly authorized reuse).detect_sum_mode_from_inputs), convertingAverageMode→SumMode1:1, so the (where × mode) → executor mapping has a single source of truth.(count, sum)from every visited count-sum-bearing element viaElement::count_sum_value_or_default():execute_document_count_and_sum_total_no_proof— empty-where primary-key fast path (gated ondocuments_summable + documents_countable) AND Equal-only-fully-covered point lookup against asummable + countableindex.execute_document_count_and_sum_per_in_value_no_proof— In on a non-range field, per-In fan-out under a shared internal read transaction.execute_document_count_and_sum_range_no_proof— range queries; the flat-summed branch walks PCPS terminator elements viadistinct_sum_path_queryand folds in Rust (grovedb's pinned rev has no no-provequery_aggregate_count_and_sumprimitive yet — that's an optional future optimization noted inline).RangeNoProofpath soGroupByIn + rangereturnsEntries(vec![one_entry])rather thanAggregate { … }(parity with sum's pre-drive: unify AVG no-prove dispatch into single DocumentCountSumRequest + joint executors (avoid double grovedb walk) #3687 output shape).The AVG dispatcher's no-prove arm collapses to one delegating call to the joint dispatcher. The
count_request/sum_requestconstruction, theAverageMode → (CountMode, SumMode)adapter, thezip_entrieshelper + tests, thelocal_txplumbing, and the inline issue-3687 follow-up comment block are all removed. The prove path is unchanged.Module docstrings and the
What's Nextsection ofbook/src/drive/average-index-examples.mdare updated to describe the unified walk and the remaining grovedb optimization opportunity.How Has This Been Tested?
cargo fmt -p drive --check— cleancargo clippy -p drive --lib --tests --no-deps -- -D warnings— cleancargo test -p drive --lib drive_document_average_query— 8/8 pass (3 retained prove-path tests + 5 new parity tests, one per joint executor / each flat-summed range branch)cargo test -p drive --lib -- drive_document_count_and_sum_query drive_document_sum_query drive_document_count_query— 76/76 passParity tests cross-check the joint executor's
(count, sum)against an independent count + sum dispatch under the same fixture, pinning behavior against the pre-#3687 double-dispatch shape on:summable + countable(Total path)Aggregate + rangeon arangeAverageablePCPS index (RangeNoProof flat-summed)GroupByIn + In + rangeon a PCPS index (RangeNoProof compound-summed)GroupByRange + range(RangeNoProof distinct)The
GroupByIn + rangeparity test caught a wire-shape regression in an earlier iteration where the joint executor returnedAggregate { … }instead of theEntries(vec![one_entry])shape sum emits for that combination; the dispatcher now applies sum's second-stage shape adapter.Breaking Changes
None.
DocumentAverageRequest/DocumentAverageResponseshapes are unchanged on both prove and no-prove paths.Checklist:
For repository code-owners and collaborators only
🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
Refactor
Tests