Skip to content

Stabilize CalciteStreamstatsCommandIT on the analytics-engine route#5582

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:feature/stabilize-streamstats-analytics-route
Jun 24, 2026
Merged

Stabilize CalciteStreamstatsCommandIT on the analytics-engine route#5582
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:feature/stabilize-streamstats-analytics-route

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Description

Gate the CalciteStreamstatsCommandIT tests that diverge on the analytics-engine route (parquet-backed composite store, DataFusion backend) so the route runs green, while keeping every test active on the v2/Calcite path. Follows the established capability-gating pattern (#5560): an in-test @RequiresCapability(...) annotation plus a matching integTestRemote excludeTestsMatching entry — both no-ops on the v2 route.

Triaged single-shard and multi-shard (num_shards=3) analytics runs against the v2 baseline. Failures fall into three groups:

DOC_MUTATION (4 tests)testStreamstatsGlobalWithNull, testStreamstatsGlobalWithNullBucket, testStreamstatsResetWithNull, testStreamstatsResetWithNullBucket seed state via PUT+DELETE. Doc-level DELETE is unsupported on the parquet store, and same-_id PUT is append-only, so the leaked doc inflated the row counts of every sibling test reading the shared index. Gated with the same DOC_MUTATION capability the three existing mutation tests already carry.

CHAINED_STREAMSTATS_BY (4 tests)testMultipleStreamstats, testMultipleStreamstatsWithWindow, testMultipleStreamstatsWithNull1, testMultipleStreamstatsWithEval. Chaining two streamstats where an upstream stage partitions by a group emits a ROW_NUMBER() sequence column from each stage; the Substrait converter names both physical columns identically ("row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"), so the stacked schema has a duplicate/ambiguous field name (500) or, for chained window streamstats, non-deterministic values. The Calcite logical plan is correct (the __stream_seq__ alias is distinct) — the alias is lost in Substrait conversion. A single streamstats by, or a chain where only the final stage has by, works. Fails single- and multi-shard.

STREAMSTATS_SORT_NOT_HONORED (1 test)testStreamstatsAndSort. The window is computed over the backend scan order, ignoring a preceding | sort (the OVER clause carries no explicit ORDER BY, since streamstats orders by encounter order by design), so the per-row aggregates diverge from the v2/Calcite path.

Pass rate — single-shard analytics route, CalciteStreamstatsCommandIT

metric before after
tests run 47 42
failures 12 0
skipped 3 7

The before-failures count is inflated by the DOC_MUTATION leak above; the four leaking tests plus their downstream row-count victims all clear once gated.

v2 route (:integTest): 47 run, 0 failed, 0 skipped — the gates are no-ops off the analytics route.

Out of scope

Twelve further tests fail only on the multi-shard route (they pass single-shard) due to cross-shard fragment-order non-determinism in the streamstats window gather (the coordinator gather has no shard-ordinal tiebreaker, so a preceding | sort can't recover deterministic order). That is an engine-side gap; these are left unchanged rather than gated, to avoid skipping passing tests on the single-shard route.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Gate the streamstats ITs that diverge on the analytics-engine route
(parquet-backed composite store, DataFusion backend) so the route runs
green, while keeping every test active on the v2/Calcite path. Mechanism
follows the established capability-gating pattern (opensearch-project#5560): an in-test
@RequiresCapability(...) annotation plus a matching integTestRemote
excludeTestsMatching entry; both are no-ops on the v2 route.

Triaged single-shard and multi-shard (num_shards=3) analytics runs against
the v2 baseline. Three groups:

- DOC_MUTATION (4 tests): testStreamstatsGlobalWithNull,
  testStreamstatsGlobalWithNullBucket, testStreamstatsResetWithNull,
  testStreamstatsResetWithNullBucket seed state via PUT+DELETE. Doc-level
  DELETE is unsupported on the parquet store, and same-_id PUT is
  append-only, so the leaked doc inflated the row counts of every sibling
  test that reads the shared index. Gated with the same DOC_MUTATION
  capability the three existing mutation tests already carry.

- CHAINED_STREAMSTATS_BY (4 tests): chaining two streamstats where an
  upstream stage partitions `by` a group emits a ROW_NUMBER() sequence
  column from each stage; the Substrait converter names both physical
  columns identically, so the stacked schema has a duplicate/ambiguous
  field name (500) or, for chained window streamstats, non-deterministic
  values. The Calcite logical plan is correct; the alias is lost in
  Substrait conversion. Fails single- and multi-shard.

- STREAMSTATS_SORT_NOT_HONORED (1 test): testStreamstatsAndSort. The window
  is computed over the backend scan order, ignoring a preceding `| sort`
  (the OVER clause carries no explicit ORDER BY), so the per-row aggregates
  diverge from the v2/Calcite path.

Pass rate on the single-shard analytics route, CalciteStreamstatsCommandIT:

| metric    | before | after |
|-----------|--------|-------|
| tests run | 47     | 42    |
| failures  | 12     | 0     |
| skipped   | 3      | 7     |

(The before-failures count is inflated by the DOC_MUTATION leak described
above; the four leaking tests plus their downstream row-count victims all
clear once gated.)

v2 route (:integTest): 47 run, 0 failed, 0 skipped — gates are no-ops
off the analytics route.

Twelve further tests fail only on the multi-shard route (they pass
single-shard) due to cross-shard fragment-order non-determinism in the
streamstats window gather; that is an engine-side gap and is left
unchanged here rather than gated, to avoid skipping passing tests on the
single-shard route.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
* for chained {@code window} streamstats, non-deterministic window values. A single {@code
* streamstats by} (or a chain where only the final stage has {@code by}) works.
*/
CHAINED_STREAMSTATS_BY(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's think how to maintain real backend capability here later. Or maybe there is better mechanism to make our IT work with various engines with different capabilities. Thanks.

@ahkcs ahkcs merged commit e99aff0 into opensearch-project:main Jun 24, 2026
29 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants