fix(drill-detail): paginate Elasticsearch samples via engine cursor#39509
fix(drill-detail): paginate Elasticsearch samples via engine cursor#39509atrsa wants to merge 14 commits intoapache:masterfrom
Conversation
Engines like Elasticsearch SQL do not support OFFSET; emitting the clause crashes the parser with 'mismatched input OFFSET expecting <EOF>'. Guard the .offset() call with db_engine_spec.allows_offset_fetch.
Engines without SQL OFFSET support (Elasticsearch, OpenDistro) now paginate drill-to-detail samples through the driver's cursor API instead of emitting OFFSET. Adds `fetch_data_with_cursor` on both engine specs and branches `get_samples` on the `allows_offset_fetch` capability flag. Exposes the flag via `EngineInformationSchema` and documents it in UPDATING.md.
|
|
||
| responses_iter = iter(transport_responses) | ||
|
|
||
| def perform_request(method, path, body=None, **_kwargs): |
There was a problem hiding this comment.
Suggestion: Add explicit parameter and return type annotations to the newly added nested request helper to satisfy typing requirements for new functions. [custom_rule]
Severity Level: Minor
| def perform_request(method, path, body=None, **_kwargs): | |
| def perform_request( | |
| method: str, | |
| path: str, | |
| body: dict[str, Any] | None = None, | |
| **_kwargs: Any, | |
| ) -> dict[str, Any]: |
Why it matters? 🤔
The existing helper is newly introduced and lacks explicit type annotations, which matches the stated typing rule. The improved code adds annotations for all parameters and the return type, and it uses names and types already available in the file (Any is imported). The syntax is valid for the codebase's Python version.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 126:126
**Comment:**
*Custom Rule: Add explicit parameter and return type annotations to the newly added nested request helper to satisfy typing requirements for new functions.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| from superset.db_engine_specs.elasticsearch import ElasticSearchEngineSpec | ||
|
|
||
| class BoomError(RuntimeError): | ||
| pass |
There was a problem hiding this comment.
Suggestion: Add a class docstring to the new exception class so newly introduced classes include documentation. [custom_rule]
Severity Level: Minor
| pass | |
| """Raised when the mocked transport fails during cursor iteration.""" |
Why it matters? 🤔
The new class is introduced without a docstring, so this directly addresses a class-documentation rule. Adding the docstring is syntactically valid and does not affect runtime behavior.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 273:273
**Comment:**
*Custom Rule: Add a class docstring to the new exception class so newly introduced classes include documentation.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| call_count = {"n": 0} | ||
| recorded_close = {} | ||
|
|
||
| def perform_request(method, path, body=None, **_kwargs): |
There was a problem hiding this comment.
Suggestion: Add explicit parameter and return type annotations to this newly added nested request helper to comply with the type-hinting rule for new functions. [custom_rule]
Severity Level: Minor
| def perform_request(method, path, body=None, **_kwargs): | |
| def perform_request( | |
| method: str, | |
| path: str, | |
| body: dict[str, Any] | None = None, | |
| **_kwargs: Any, | |
| ) -> dict[str, Any]: |
Why it matters? 🤔
The callback is newly added and currently untyped, so the type-hinting rule is violated if that rule applies to new functions. The proposed replacement adds explicit types without changing behavior, and all referenced symbols (Any, responses, BoomError) exist in the surrounding test.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 282:282
**Comment:**
*Custom Rule: Add explicit parameter and return type annotations to this newly added nested request helper to comply with the type-hinting rule for new functions.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| unguarded: list[int] = [] | ||
|
|
||
| class Visitor(ast.NodeVisitor): | ||
| def __init__(self) -> None: |
There was a problem hiding this comment.
Suggestion: Add a docstring to the newly introduced inner class so the class definition is self-documented. [custom_rule]
Severity Level: Minor
| def __init__(self) -> None: | |
| """Walk the AST and track whether offset assignments are properly guarded.""" |
Why it matters? 🤔
The suggestion addresses a real docstring omission in the newly introduced inner class. Adding a class docstring is syntactically valid and does not affect runtime behavior.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 64:64
**Comment:**
*Custom Rule: Add a docstring to the newly introduced inner class so the class definition is self-documented.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
|
||
| class Visitor(ast.NodeVisitor): | ||
| def __init__(self) -> None: | ||
| self._in_guarded_if = 0 |
There was a problem hiding this comment.
Suggestion: Add a docstring to the new initializer method to satisfy the docstring requirement for newly added functions. [custom_rule]
Severity Level: Minor
| self._in_guarded_if = 0 | |
| """Initialize the guard-depth tracker for conditional offset checks.""" |
Why it matters? 🤔
The initializer is newly added and lacks a docstring, so the suggestion matches the rule violation. The added docstring is valid Python and does not change behavior.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 65:65
**Comment:**
*Custom Rule: Add a docstring to the new initializer method to satisfy the docstring requirement for newly added functions.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| self._in_guarded_if = 0 | ||
|
|
||
| def visit_If(self, node: ast.If) -> None: # noqa: N802 | ||
| if _uses_supports_offset(node.test): |
There was a problem hiding this comment.
Suggestion: Add a docstring to the new visitor method so all newly added functions include documentation. [custom_rule]
Severity Level: Minor
| if _uses_supports_offset(node.test): | |
| """Track guarded branches when an `if` condition checks `supports_offset`.""" |
Why it matters? 🤔
This is a newly introduced method without a docstring, so the suggestion is aligned with the docstring rule. The improved code is syntactically correct and only adds documentation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 68:68
**Comment:**
*Custom Rule: Add a docstring to the new visitor method so all newly added functions include documentation.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix| self.generic_visit(node) | ||
|
|
||
| def visit_Assign(self, node: ast.Assign) -> None: # noqa: N802 | ||
| if _is_qry_offset_assignment(node) and self._in_guarded_if == 0: |
There was a problem hiding this comment.
Suggestion: Add a docstring to the new assignment visitor method to comply with the docstring rule for added functions. [custom_rule]
Severity Level: Minor
| if _is_qry_offset_assignment(node) and self._in_guarded_if == 0: | |
| """Record line numbers for offset assignments that are outside guarded blocks.""" |
Why it matters? 🤔
The new visitor method is missing a docstring, so the suggestion correctly addresses the violation. The added string literal is valid and does not introduce any code changes beyond documentation.
Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 79:79
**Comment:**
*Custom Rule: Add a docstring to the new assignment visitor method to comply with the docstring rule for added functions.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Code Review Agent Run #0b865c
Actionable Suggestions - 3
-
tests/unit_tests/models/test_helpers_offset.py - 1
- Use of assert in production code · Line 55-55
-
tests/unit_tests/db_engine_specs/test_elasticsearch.py - 1
- Wrong parametrize argument type · Line 316-317
-
superset/db_engine_specs/elasticsearch.py - 1
- Blind exception catch should be specific · Line 121-121
Filtered by Review Rules
Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
-
superset/views/datasource/utils.py - 1
- Broad Exception Catch · Line 193-193
Review Details
-
Files reviewed - 10 · Commit Range:
30067ad..1ec5279- superset/databases/schemas.py
- superset/db_engine_specs/base.py
- superset/db_engine_specs/elasticsearch.py
- superset/models/helpers.py
- superset/views/datasource/utils.py
- tests/unit_tests/databases/test_schemas.py
- tests/unit_tests/db_engine_specs/test_base.py
- tests/unit_tests/db_engine_specs/test_elasticsearch.py
- tests/unit_tests/models/test_helpers_offset.py
- tests/unit_tests/views/datasource/test_utils.py
-
Files skipped - 1
- UPDATING.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| Black-style reformatting and trivial refactors. | ||
| """ | ||
| source = HELPERS_PATH.read_text() | ||
| assert "supports_offset" in source, ( |
There was a problem hiding this comment.
Replace assert statement with explicit error handling using pytest.fail() or raise an exception, as assert can be disabled with Python's -O flag.
Code suggestion
Check the AI-generated fix before applying
from pathlib import Path
+import pytest
HELPERS_PATH = (
Path(__file__).resolve().parents[3] / "superset" / "models" / "helpers.py"
@@ -54,10 +55,10 @@
"""
source = HELPERS_PATH.read_text()
- assert "supports_offset" in source, (
+ if "supports_offset" not in source:
+ pytest.fail(
"helpers.py no longer references supports_offset; the OFFSET "
"guard is gone — Elasticsearch drill-to-detail will crash on page 2+."
- )
+ )
Code Review Run #0b865c
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| @pytest.mark.parametrize( | ||
| "sql_in,expected_query", |
There was a problem hiding this comment.
Change the first argument to pytest.mark.parametrize on line 317 from a string to a tuple. Use ("sql_in", "expected_query") instead of "sql_in,expected_query".
Code suggestion
Check the AI-generated fix before applying
| @pytest.mark.parametrize( | |
| "sql_in,expected_query", | |
| ("sql_in", "expected_query"), |
Code Review Run #0b865c
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| headers=json_headers, | ||
| body={"cursor": cursor}, | ||
| ) | ||
| except Exception: # pylint: disable=broad-except |
There was a problem hiding this comment.
Replace the broad Exception catch on line 121 with a specific exception type (e.g., ConnectionError, TimeoutError, or RequestException) to avoid masking unexpected errors.
Code suggestion
Check the AI-generated fix before applying
| except Exception: # pylint: disable=broad-except | |
| except (ConnectionError, TimeoutError, Exception): # pylint: disable=broad-except |
Code Review Run #0b865c
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
SUMMARY
Drill to Detail paginates sample rows with
LIMIT/OFFSET, which is unsupported by the Elasticsearch and OpenDistro/OpenSearch SQL APIs. Page 1 loads, but any click beyond that throwsparsing_exception: mismatched input 'OFFSET'and the drill-detail modal errors out.This PR introduces a small engine-spec capability (allows_offset_fetch, default True) and an optional cursor-based pagination hook (fetch_data_with_cursor) on BaseEngineSpec. The changes are scoped:
superset/models/helpers.py): skipqry.offset(...)when the backing engine opts out. Page 1 behaviour is unchanged becauserow_offset == 0.superset/db_engine_specs/elasticsearch.py): opt out ofOFFSET, and implementfetch_data_with_cursoragainst the native ES SQL cursor (/_sql+/_sql/close, or/_opendistro/_sql+/_opendistro/_sql/close). The helper strips trailing ; and a trailing LIMIT N from the submitted SQL (both break ES cursor semantics) and setsContent-Type: application/jsonon the raw transport.superset/views/datasource/utils.py): for engines that can't doOFFSET, pages > 1 delegate to the engine-spec cursor helper. Colnames/coltypes are sourced from the normal samples payload so the frontend grid renders identically to page 1. Count-star cache is evicted on failure, matching the existing FAILED path.superset/databases/schemas.py): exposeallows_offset_fetchonEngineInformationSchemafor completeness.No other engine specs are touched; no behaviour change for Postgres, MySQL,
BigQuery, etc.
PERFORMANCE NOTE
The ES SQL cursor API is forward-only, so for engines on the cursor path (Elasticsearch, OpenDistro/OpenSearch) reaching page N of drill-to-detail issues N round trips to the cluster. Deep pagination cost is therefore linear in page number, not constant like OFFSET-capable engines. In practice drill-to-detail is used to skim the first handful of pages, so this is a reasonable trade-off — but if a user paginates into the hundreds or thousands of pages, latency will grow proportionally. Documented in UPDATING.md and in the _fetch_page_via_cursor docstring so forks can reason about the cost.
Fixes #24563
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
elasticsearch-dbapiinstalled and an ES 7.17side-container.
POST /datasource/samples?datasource_type=table&datasource_id=<id>&force=true&pa ge=N&per_page=50for N = 1, 2, 3.Before this PR (stock 5.0.0):
expecting ".
After this PR:
ADDITIONAL INFORMATION