Skip to content

feat(drive): lift In-position restriction in regular document queries (Index::matches last-or-before-last) #3629

@QuantumExplorer

Description

@QuantumExplorer

Expected Behavior

Regular document queries should accept an `In` clause at any position in the chosen covering index, with any number of Equal clauses after it. Concretely:

```
WHERE a IN [..] AND b = y AND c = z
```

against an index `[a, b, c]` should be a valid query, routed through the existing prove/no-prove pipelines.

Current Behavior

`DriveDocumentQuery::find_best_index` → `DocumentType::index_for_types` → `Index::matches` (at packages/rs-dpp/src/data_contract/document_type/index/mod.rs:503-515) rejects any covering-index candidate where `In` is not on the last or before-last property:

```rust
// the in field can only be on the last or before last property
if let Some(in_field_name) = in_field_name {
if last_property.name.as_str() != in_field_name {
if self.properties.len() == 1 { return None; }
let before_last_property = self.properties.get(self.properties.len() - 2)?;
if before_last_property.name.as_str() != in_field_name {
return None;
}
}
}
```

So `a IN [..] AND b = y AND c = z` against `[a, b, c]` returns no covering index and the query fails with `WhereClauseOnNonIndexedProperty`, even though such an index is the natural data shape to index this query.

Root Cause (not policy — a real structural assumption)

`Index::matches`'s restriction isn't a design choice; it's a guard that prevents the picker from selecting a shape `DriveDocumentQuery::get_non_primary_key_path_query` (packages/rs-drive/src/query/mod.rs:1927-2150) can't actually build correctly. The path-construction code positionally zips `intermediate_indexes` (the index's first N property names) with `intermediate_values` (the Equal-clause values):

```rust
let (intermediate_indexes, last_indexes) =
index.properties.split_at(intermediate_values.len());

for (intermediate_index, intermediate_value) in
intermediate_indexes.iter().zip(intermediate_values.iter())
{
path.push(intermediate_index.name.as_bytes().to_vec());
path.push(intermediate_value.as_slice().to_vec());
}
path.push(last_index.name.as_bytes().to_vec());
```

This works only when every "intermediate" position is itself an Equal clause and the In/range sits at one of the two terminal positions:

  • `[a, b]` + `a = x AND b IN [..]`: `intermediate_values = [x]` → `intermediate_indexes = [a]` → zip pairs `(a, x)`. Correct.
  • `[a, b, c]` + `a = x AND b = y AND c IN [..]`: `intermediate_values = [x, y]` → `intermediate_indexes = [a, b]` → zip pairs `(a, x), (b, y)`. Correct.
  • `[a, b, c]` + `a IN [..] AND b = y AND c = z`: `intermediate_values = [y, z]` (the In contributes no value because it's not in `equal_clauses`) → `intermediate_indexes = [a, b]` → zip pairs `(a, y), (b, z)` — `y` ends up under name `a`, `z` under `b`. The path it would build is structurally wrong, so `Index::matches` refuses to even let the picker choose this shape.

There's special-case logic (lines 1947-1973) for the `In + range` compound case via a `subquery_clause`, which handles a single trailing leaf clause via `set_subquery_key` + `set_subquery`. But there's no `set_subquery_path` chaining for "In followed by multiple trailing Equals."

Possible Solution

Restructure path construction to either:

  1. Track In's position explicitly and stop the outer path at the In-bearing property's name subtree, then use `set_subquery_path` with the post-In Equals' `(prop_name, serialized_value)` pairs (plus any range-leaf subquery), then `set_subquery_key([0])` / `set_subquery` for the document descent. This is the same mechanism `distinct_count_path_query` and `point_lookup_count_path_query` already use in the count path (see `drive_document_count_query/path_query.rs`).
  2. Replace the positional `intermediate_indexes ↔ intermediate_values` zip with a name-keyed traversal of `index.properties` (same shape `expand_paths_and_count` already uses in the count path's no-proof executor).

Once `get_non_primary_key_path_query` no longer requires positional alignment, the `Index::matches` guard can be loosened — the rule becomes "exactly one `In` clause on the chosen index, anywhere," matching the constraint the count path now accepts.

Required follow-up touchpoints:

  • `OrderClause` validation when In is on a non-terminal position (which property defines the walk order? grovedb's `set_subquery_path` doesn't allow re-ordering within the descent, so the order_by surface needs to be specified)
  • `startsAt`/`startAfter` cursor semantics for In-on-prefix — current cursor logic assumes a single walk dimension
  • `limit/offset` semantics across the cartesian fork (the prove path will need the same "validate-don't-clamp" treatment as the count path's range modes — `max_query_limit` rejection rather than silent clamping, to preserve proof determinism)
  • E2E test coverage across the new shapes; existing query tests focus on `In + range` and `In on terminal` only

Alternatives Considered

  • Keep the restriction as-is. Documented and consistent, but users hitting natural query shapes (`brand IN [..] AND model = x AND year = y` on a `[brand, model, year]` index) need to either redesign their index or fall back to per-In-value sequential queries.
  • Lift on count path only. Already done in `#3623` (commit `18c13b0f41`). The count path is more permissive than regular doc query today: `a IN [..] AND b = y AND c = z` is supported for counting but not for fetching documents. Surfacing this asymmetry to users is reasonable as an interim step but a unified contract is the right end state.

Additional Context

The count path's relaxation in #3623 demonstrates that the underlying grovedb primitives (`set_subquery_path` + `set_subquery`) support this shape; the work is entirely in drive's regular document query construction logic. The count path's `point_lookup_count_path_query` is a working reference for the structure — adapted to drive's document materialization path, the principle is the same:

  • `base_path` ends at the In-bearing property's name subtree
  • outer `Query` keys = sorted serialized In values (lex-asc, for cursor / pagination determinism)
  • `set_subquery_path` carries the post-In Equal pairs in index order
  • `set_subquery` is the document-descent walk (the existing `recursive_create_query` / `recursive_insert_on_query` logic, but rooted at the resolved leaf rather than at the index level)

Pre-release scope means this could land breaking. Post-release would want it gated behind a platform-version bump.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions