Expected Behavior
Regular document queries should accept an `In` clause at any position in the chosen covering index, with any number of Equal clauses after it. Concretely:
```
WHERE a IN [..] AND b = y AND c = z
```
against an index `[a, b, c]` should be a valid query, routed through the existing prove/no-prove pipelines.
Current Behavior
`DriveDocumentQuery::find_best_index` → `DocumentType::index_for_types` → `Index::matches` (at packages/rs-dpp/src/data_contract/document_type/index/mod.rs:503-515) rejects any covering-index candidate where `In` is not on the last or before-last property:
```rust
// the in field can only be on the last or before last property
if let Some(in_field_name) = in_field_name {
if last_property.name.as_str() != in_field_name {
if self.properties.len() == 1 { return None; }
let before_last_property = self.properties.get(self.properties.len() - 2)?;
if before_last_property.name.as_str() != in_field_name {
return None;
}
}
}
```
So `a IN [..] AND b = y AND c = z` against `[a, b, c]` returns no covering index and the query fails with `WhereClauseOnNonIndexedProperty`, even though such an index is the natural data shape to index this query.
Root Cause (not policy — a real structural assumption)
`Index::matches`'s restriction isn't a design choice; it's a guard that prevents the picker from selecting a shape `DriveDocumentQuery::get_non_primary_key_path_query` (packages/rs-drive/src/query/mod.rs:1927-2150) can't actually build correctly. The path-construction code positionally zips `intermediate_indexes` (the index's first N property names) with `intermediate_values` (the Equal-clause values):
```rust
let (intermediate_indexes, last_indexes) =
index.properties.split_at(intermediate_values.len());
for (intermediate_index, intermediate_value) in
intermediate_indexes.iter().zip(intermediate_values.iter())
{
path.push(intermediate_index.name.as_bytes().to_vec());
path.push(intermediate_value.as_slice().to_vec());
}
path.push(last_index.name.as_bytes().to_vec());
```
This works only when every "intermediate" position is itself an Equal clause and the In/range sits at one of the two terminal positions:
- `[a, b]` + `a = x AND b IN [..]`: `intermediate_values = [x]` → `intermediate_indexes = [a]` → zip pairs `(a, x)`. Correct.
- `[a, b, c]` + `a = x AND b = y AND c IN [..]`: `intermediate_values = [x, y]` → `intermediate_indexes = [a, b]` → zip pairs `(a, x), (b, y)`. Correct.
- `[a, b, c]` + `a IN [..] AND b = y AND c = z`: `intermediate_values = [y, z]` (the In contributes no value because it's not in `equal_clauses`) → `intermediate_indexes = [a, b]` → zip pairs `(a, y), (b, z)` — `y` ends up under name `a`, `z` under `b`. The path it would build is structurally wrong, so `Index::matches` refuses to even let the picker choose this shape.
There's special-case logic (lines 1947-1973) for the `In + range` compound case via a `subquery_clause`, which handles a single trailing leaf clause via `set_subquery_key` + `set_subquery`. But there's no `set_subquery_path` chaining for "In followed by multiple trailing Equals."
Possible Solution
Restructure path construction to either:
- Track In's position explicitly and stop the outer path at the In-bearing property's name subtree, then use `set_subquery_path` with the post-In Equals' `(prop_name, serialized_value)` pairs (plus any range-leaf subquery), then `set_subquery_key([0])` / `set_subquery` for the document descent. This is the same mechanism `distinct_count_path_query` and `point_lookup_count_path_query` already use in the count path (see `drive_document_count_query/path_query.rs`).
- Replace the positional `intermediate_indexes ↔ intermediate_values` zip with a name-keyed traversal of `index.properties` (same shape `expand_paths_and_count` already uses in the count path's no-proof executor).
Once `get_non_primary_key_path_query` no longer requires positional alignment, the `Index::matches` guard can be loosened — the rule becomes "exactly one `In` clause on the chosen index, anywhere," matching the constraint the count path now accepts.
Required follow-up touchpoints:
- `OrderClause` validation when In is on a non-terminal position (which property defines the walk order? grovedb's `set_subquery_path` doesn't allow re-ordering within the descent, so the order_by surface needs to be specified)
- `startsAt`/`startAfter` cursor semantics for In-on-prefix — current cursor logic assumes a single walk dimension
- `limit/offset` semantics across the cartesian fork (the prove path will need the same "validate-don't-clamp" treatment as the count path's range modes — `max_query_limit` rejection rather than silent clamping, to preserve proof determinism)
- E2E test coverage across the new shapes; existing query tests focus on `In + range` and `In on terminal` only
Alternatives Considered
- Keep the restriction as-is. Documented and consistent, but users hitting natural query shapes (`brand IN [..] AND model = x AND year = y` on a `[brand, model, year]` index) need to either redesign their index or fall back to per-In-value sequential queries.
- Lift on count path only. Already done in `#3623` (commit `18c13b0f41`). The count path is more permissive than regular doc query today: `a IN [..] AND b = y AND c = z` is supported for counting but not for fetching documents. Surfacing this asymmetry to users is reasonable as an interim step but a unified contract is the right end state.
Additional Context
The count path's relaxation in #3623 demonstrates that the underlying grovedb primitives (`set_subquery_path` + `set_subquery`) support this shape; the work is entirely in drive's regular document query construction logic. The count path's `point_lookup_count_path_query` is a working reference for the structure — adapted to drive's document materialization path, the principle is the same:
- `base_path` ends at the In-bearing property's name subtree
- outer `Query` keys = sorted serialized In values (lex-asc, for cursor / pagination determinism)
- `set_subquery_path` carries the post-In Equal pairs in index order
- `set_subquery` is the document-descent walk (the existing `recursive_create_query` / `recursive_insert_on_query` logic, but rooted at the resolved leaf rather than at the index level)
Pre-release scope means this could land breaking. Post-release would want it gated behind a platform-version bump.
Expected Behavior
Regular document queries should accept an `In` clause at any position in the chosen covering index, with any number of Equal clauses after it. Concretely:
```
WHERE a IN [..] AND b = y AND c = z
```
against an index `[a, b, c]` should be a valid query, routed through the existing prove/no-prove pipelines.
Current Behavior
`DriveDocumentQuery::find_best_index` → `DocumentType::index_for_types` → `Index::matches` (at
packages/rs-dpp/src/data_contract/document_type/index/mod.rs:503-515) rejects any covering-index candidate where `In` is not on the last or before-last property:```rust
// the in field can only be on the last or before last property
if let Some(in_field_name) = in_field_name {
if last_property.name.as_str() != in_field_name {
if self.properties.len() == 1 { return None; }
let before_last_property = self.properties.get(self.properties.len() - 2)?;
if before_last_property.name.as_str() != in_field_name {
return None;
}
}
}
```
So `a IN [..] AND b = y AND c = z` against `[a, b, c]` returns no covering index and the query fails with `WhereClauseOnNonIndexedProperty`, even though such an index is the natural data shape to index this query.
Root Cause (not policy — a real structural assumption)
`Index::matches`'s restriction isn't a design choice; it's a guard that prevents the picker from selecting a shape `DriveDocumentQuery::get_non_primary_key_path_query` (
packages/rs-drive/src/query/mod.rs:1927-2150) can't actually build correctly. The path-construction code positionally zips `intermediate_indexes` (the index's first N property names) with `intermediate_values` (the Equal-clause values):```rust
let (intermediate_indexes, last_indexes) =
index.properties.split_at(intermediate_values.len());
for (intermediate_index, intermediate_value) in
intermediate_indexes.iter().zip(intermediate_values.iter())
{
path.push(intermediate_index.name.as_bytes().to_vec());
path.push(intermediate_value.as_slice().to_vec());
}
path.push(last_index.name.as_bytes().to_vec());
```
This works only when every "intermediate" position is itself an Equal clause and the In/range sits at one of the two terminal positions:
There's special-case logic (lines 1947-1973) for the `In + range` compound case via a `subquery_clause`, which handles a single trailing leaf clause via `set_subquery_key` + `set_subquery`. But there's no `set_subquery_path` chaining for "In followed by multiple trailing Equals."
Possible Solution
Restructure path construction to either:
Once `get_non_primary_key_path_query` no longer requires positional alignment, the `Index::matches` guard can be loosened — the rule becomes "exactly one `In` clause on the chosen index, anywhere," matching the constraint the count path now accepts.
Required follow-up touchpoints:
Alternatives Considered
Additional Context
The count path's relaxation in #3623 demonstrates that the underlying grovedb primitives (`set_subquery_path` + `set_subquery`) support this shape; the work is entirely in drive's regular document query construction logic. The count path's `point_lookup_count_path_query` is a working reference for the structure — adapted to drive's document materialization path, the principle is the same:
Pre-release scope means this could land breaking. Post-release would want it gated behind a platform-version bump.