Skip to content

[CALCITE-7608] Introduce a SelectMany operator#5031

Open
mihaibudiu wants to merge 4 commits into
apache:mainfrom
mihaibudiu:issue7608
Open

[CALCITE-7608] Introduce a SelectMany operator#5031
mihaibudiu wants to merge 4 commits into
apache:mainfrom
mihaibudiu:issue7608

Conversation

@mihaibudiu

@mihaibudiu mihaibudiu commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Jira Link

CALCITE-7608

Changes Proposed

This PR introduces a new operator SelectMany, which generalizes Uncollect. The operator combines Correlate + Unnest and has two forms: inner join and left join.

The PR is divided into four commits which should probably be left separate:

  • stronger validation for UNNEST, which only accepts INNER and LEFT JOINs
  • introduce a SelectMany and it's Logical variant and add support to the RelBuilder
  • introduce a CoreRule which rewrites Correlate + Unnest into SelectMany. The rule is not enabled by default for backwards compatibility. This rule should probably be executed before the decorrelator. In the future this rewrite could be moved into SqlToRelConverter as well.
  • introduce a runtime implementation for SelectMany and the associated Java code generation

This operator strictly more expressive than Uncollect. (I believe that the existing Uncollect does not even support LEFT JOINs). In the long term Uncollect should ideally be deprecated in favor of this operator.

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@sonarqubecloud

Copy link
Copy Markdown

@iwanttobepowerful

Copy link
Copy Markdown
Contributor

We believe this PR is quite important, so I’d really appreciate it if you could take another look when you have a moment. @zabetak @julianhyde

# Validated on Postgres
SELECT n, a, b
FROM (SELECT 'test' AS n, ARRAY[1, 2, 3] AS arr1, ARRAY[10, 20] AS arr2) AS u,
UNNEST(u.arr1, u.arr2) AS t(a, b);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about add order by a

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make the output deterministic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@iwanttobepowerful

iwanttobepowerful commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Generating LogicalSelectMany directly within SqlToRelConverter would feel cleaner and more natural, I think. I’m fully on board with deprecating Uncollect in favor of this new operator.
Also, would it be reasonable to rename LogicalSelectMany to LogicalUnnest? The current name feels a bit too LINQ-specific for a core Calcite relational operator.

@iwanttobepowerful

iwanttobepowerful commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Thanks a lot for introducing SelectMany — this is a much cleaner abstraction than the current Correlate + Uncollect pattern.

You note in the description that "in the future this rewrite could be moved into SqlToRelConverter as well." I'd like to make the case for doing (at least an initial version of) that in this PR, because I think the rule-based approach has an inherent limitation:

A pattern-matching rule only fires when the plan matches the exact shape it expects, and that shape is easily perturbed — by an intervening Project or pushed-down Filter, a trait change, or decorrelation running first. When that happens, a perfectly convertible query silently misses the rewrite and falls back to the less efficient Correlate form, and trying to generalize the pattern to cover every shape tends to become an endless game of whack-a-mole.

Generating LogicalSelectMany directly in convertUnnest() sidesteps this. The converter already has everything it needs — the input row, the array expressions, withOrdinality, and the join type. By projecting both the pass-through columns and the collection columns onto the same input row, the array expressions stay as ordinary input references (no correlation variable is ever introduced), and SelectMany can be emitted deterministically — before any optimization can disturb the shape. This makes FROM t CROSS/LEFT JOIN UNNEST(t.arr) skip both the Correlate intermediate form and the subsequent decorrelation step entirely, and guarantees the operator is produced whenever it's applicable, rather than "whenever the rule happens to match."

The two approaches are complementary, not competing: the rule is still valuable for plans that already contain Correlate + Unnest (e.g. coming from other frontends), while direct generation covers the SQL path. To stay backwards-compatible, the sql2rel path could be gated behind a SqlToRelConverter.Config flag (off by default), mirroring how you disabled the rewrite rule by default.

Happy to help with this if you think it's in scope — otherwise it could make a good follow-up.

@mihaibudiu

Copy link
Copy Markdown
Contributor Author

There so many config flags in Calcite...

Thinking of this, having a way to do it in SqlToRelConverter is probably very useful, so we should offer this possibility.
Deprecating Uncollect is a big move, it may cause problems for other people who depend on it.
We should probably give it some time.

SelectMany is actually not a very good name, the "real" selectMany is actually much more general, it takes an arbitrary function which returns an iterator. We could call it LogicalUnnest.

Let's see if there are other comments which I can address in an updated commit. I will try to add it to SqlToRel, but not sure I will get to it during the week-end. Happy to get other naming suggestions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants