Skip to content

[fix](subquery) Preserve outer-scope columns in correlated subqueries containing CTE#63040

Open
starocean999 wants to merge 2 commits into
apache:masterfrom
starocean999:master_0506
Open

[fix](subquery) Preserve outer-scope columns in correlated subqueries containing CTE#63040
starocean999 wants to merge 2 commits into
apache:masterfrom
starocean999:master_0506

Conversation

@starocean999
Copy link
Copy Markdown
Contributor

Problem: Correlated subqueries that include an inner WITH (CTE) could lose references to outer-scope columns during analysis, causing incorrect planning or query rejection.

Fix: Update Nereids CTE/correlated-subquery analysis to preserve outer-scope symbols across CTE boundaries and add a validation to reject unsafe outer-column references inside CTE producers.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue that should be fixed before merging.

Critical checkpoint conclusions:

  • Goal/test: the PR fixes correlated subqueries containing CTEs and adds FE plus regression coverage for EXISTS, IN, scalar, and an unsupported correlated-CTE-producer case, but it misses nondeterministic or multi-consumer CTE semantics under subqueries.
  • Scope/focus: the implementation is small and focused, but the global must-inline marking is too broad for existing CTE materialization semantics.
  • Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, configuration, persistence, or FE-BE protocol compatibility concerns found.
  • Parallel paths: existing CTEInline logic has explicit non-inline handling for nondeterministic or heavily referenced CTEs; this new subquery path bypasses it.
  • Tests: added tests cover the main happy path, but not the nondeterministic/materialized CTE case that regresses here. Regression output ordering is deterministic via ORDER BY.
  • Observability/performance: no additional observability need identified; performance impact is not the concern, semantics are.

User focus: no additional user-provided review focus was specified.

@starocean999
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed. I did not find additional distinct blocking issues beyond the already-open inline thread about forcing CTE inline inside subqueries, so I am not duplicating that concern in a new comment.

Critical checkpoint conclusions:

  • Goal and tests: the PR targets correlated subqueries containing inner CTEs; added FE unit coverage and a query regression case exercise EXISTS, IN, scalar, and rejected correlated CTE-producer cases.
  • Scope/focus: the implementation is small and localized to Nereids context propagation, subquery analysis, subquery-unnesting rewrite placement, and CTE inline safety.
  • Concurrency/lifecycle/config/compatibility: no new concurrency, lifecycle, configuration, storage-format, or FE-BE protocol compatibility concerns identified.
  • Parallel paths: the relevant subquery analysis and CTE inline paths were checked; the existing force-inline semantics discussion remains the main open area.
  • Tests/results: regression output is present and ordered; no additional user-provided review focus was supplied.
  • Observability/transactions/data writes: not applicable for this planner-only change.
  • Performance: no new distinct issue found beyond the existing discussion about bypassing normal CTE materialization behavior.

User focus: review_focus.txt says no additional user-provided review focus; no extra focus-specific issue found.

@starocean999
Copy link
Copy Markdown
Contributor Author

run buildall

@starocean999 starocean999 marked this pull request as ready for review May 11, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants