Skip to content

fix(drill-detail): paginate Elasticsearch samples via engine cursor#39509

Open
atrsa wants to merge 14 commits intoapache:masterfrom
atrsa:fix/elasticsearch-drill-detail-pagination
Open

fix(drill-detail): paginate Elasticsearch samples via engine cursor#39509
atrsa wants to merge 14 commits intoapache:masterfrom
atrsa:fix/elasticsearch-drill-detail-pagination

Conversation

@atrsa
Copy link
Copy Markdown

@atrsa atrsa commented Apr 21, 2026

SUMMARY

Drill to Detail paginates sample rows with LIMIT / OFFSET, which is unsupported by the Elasticsearch and OpenDistro/OpenSearch SQL APIs. Page 1 loads, but any click beyond that throws parsing_exception: mismatched input 'OFFSET' and the drill-detail modal errors out.

This PR introduces a small engine-spec capability (allows_offset_fetch, default True) and an optional cursor-based pagination hook (fetch_data_with_cursor) on BaseEngineSpec. The changes are scoped:

  • Query builder (superset/models/helpers.py): skip qry.offset(...) when the backing engine opts out. Page 1 behaviour is unchanged because row_offset == 0.
  • Elasticsearch / OpenDistro specs (superset/db_engine_specs/elasticsearch.py): opt out of OFFSET, and implement fetch_data_with_cursor against the native ES SQL cursor (/_sql + /_sql/close, or /_opendistro/_sql + /_opendistro/_sql/close). The helper strips trailing ; and a trailing LIMIT N from the submitted SQL (both break ES cursor semantics) and sets Content-Type: application/json on the raw transport.
  • Samples endpoint (superset/views/datasource/utils.py): for engines that can't do OFFSET, pages > 1 delegate to the engine-spec cursor helper. Colnames/coltypes are sourced from the normal samples payload so the frontend grid renders identically to page 1. Count-star cache is evicted on failure, matching the existing FAILED path.
  • API schema (superset/databases/schemas.py): expose allows_offset_fetch on EngineInformationSchema for completeness.

No other engine specs are touched; no behaviour change for Postgres, MySQL,
BigQuery, etc.

PERFORMANCE NOTE

The ES SQL cursor API is forward-only, so for engines on the cursor path (Elasticsearch, OpenDistro/OpenSearch) reaching page N of drill-to-detail issues N round trips to the cluster. Deep pagination cost is therefore linear in page number, not constant like OFFSET-capable engines. In practice drill-to-detail is used to skim the first handful of pages, so this is a reasonable trade-off — but if a user paginates into the hundreds or thousands of pages, latency will grow proportionally. Documented in UPDATING.md and in the _fetch_page_via_cursor docstring so forks can reason about the cost.

Fixes #24563

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before After

TESTING INSTRUCTIONS

  1. Stand up Superset 5.0.0 with elasticsearch-dbapi installed and an ES 7.17
    side-container.
  2. Seed an ES index with >100 rows (e.g. drill_demo with 180 orders).
  3. Register an elasticsearch+http://… database and a dataset on the seeded index.
  4. POST /datasource/samples?datasource_type=table&datasource_id=<id>&force=true&pa ge=N&per_page=50 for N = 1, 2, 3.

Before this PR (stock 5.0.0):

  • Page 1 → 200 OK, 50 rows.
  • Pages 2 and 3 → "error": "Error (parsing_exception): … mismatched input 'OFFSET'
    expecting ".
    After this PR:
  • Page 1 → 50 rows (ids 1–50).
  • Page 2 → 50 rows (ids 51–100).
  • Page 3 → 50 rows (ids 101–150).

ADDITIONAL INFORMATION

  • Has associated issue: Drill by pagination does not work with Elasticsearch #24563
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

atrsa added 13 commits April 19, 2026 14:43
Engines like Elasticsearch SQL do not support OFFSET; emitting the
clause crashes the parser with 'mismatched input OFFSET expecting <EOF>'.
Guard the .offset() call with db_engine_spec.allows_offset_fetch.
Engines without SQL OFFSET support (Elasticsearch, OpenDistro) now paginate
drill-to-detail samples through the driver's cursor API instead of emitting OFFSET.
Adds `fetch_data_with_cursor` on both engine specs and branches
`get_samples` on the `allows_offset_fetch` capability flag.
Exposes the flag via `EngineInformationSchema` and documents it in UPDATING.md.

responses_iter = iter(transport_responses)

def perform_request(method, path, body=None, **_kwargs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add explicit parameter and return type annotations to the newly added nested request helper to satisfy typing requirements for new functions. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
def perform_request(method, path, body=None, **_kwargs):
def perform_request(
method: str,
path: str,
body: dict[str, Any] | None = None,
**_kwargs: Any,
) -> dict[str, Any]:
Why it matters? 🤔

The existing helper is newly introduced and lacks explicit type annotations, which matches the stated typing rule. The improved code adds annotations for all parameters and the return type, and it uses names and types already available in the file (Any is imported). The syntax is valid for the codebase's Python version.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 126:126
**Comment:**
	*Custom Rule: Add explicit parameter and return type annotations to the newly added nested request helper to satisfy typing requirements for new functions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

from superset.db_engine_specs.elasticsearch import ElasticSearchEngineSpec

class BoomError(RuntimeError):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a class docstring to the new exception class so newly introduced classes include documentation. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
pass
"""Raised when the mocked transport fails during cursor iteration."""
Why it matters? 🤔

The new class is introduced without a docstring, so this directly addresses a class-documentation rule. Adding the docstring is syntactically valid and does not affect runtime behavior.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 273:273
**Comment:**
	*Custom Rule: Add a class docstring to the new exception class so newly introduced classes include documentation.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

call_count = {"n": 0}
recorded_close = {}

def perform_request(method, path, body=None, **_kwargs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add explicit parameter and return type annotations to this newly added nested request helper to comply with the type-hinting rule for new functions. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
def perform_request(method, path, body=None, **_kwargs):
def perform_request(
method: str,
path: str,
body: dict[str, Any] | None = None,
**_kwargs: Any,
) -> dict[str, Any]:
Why it matters? 🤔

The callback is newly added and currently untyped, so the type-hinting rule is violated if that rule applies to new functions. The proposed replacement adds explicit types without changing behavior, and all referenced symbols (Any, responses, BoomError) exist in the surrounding test.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/db_engine_specs/test_elasticsearch.py
**Line:** 282:282
**Comment:**
	*Custom Rule: Add explicit parameter and return type annotations to this newly added nested request helper to comply with the type-hinting rule for new functions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

unguarded: list[int] = []

class Visitor(ast.NodeVisitor):
def __init__(self) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a docstring to the newly introduced inner class so the class definition is self-documented. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
def __init__(self) -> None:
"""Walk the AST and track whether offset assignments are properly guarded."""
Why it matters? 🤔

The suggestion addresses a real docstring omission in the newly introduced inner class. Adding a class docstring is syntactically valid and does not affect runtime behavior.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 64:64
**Comment:**
	*Custom Rule: Add a docstring to the newly introduced inner class so the class definition is self-documented.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎


class Visitor(ast.NodeVisitor):
def __init__(self) -> None:
self._in_guarded_if = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a docstring to the new initializer method to satisfy the docstring requirement for newly added functions. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
self._in_guarded_if = 0
"""Initialize the guard-depth tracker for conditional offset checks."""
Why it matters? 🤔

The initializer is newly added and lacks a docstring, so the suggestion matches the rule violation. The added docstring is valid Python and does not change behavior.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 65:65
**Comment:**
	*Custom Rule: Add a docstring to the new initializer method to satisfy the docstring requirement for newly added functions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

self._in_guarded_if = 0

def visit_If(self, node: ast.If) -> None: # noqa: N802
if _uses_supports_offset(node.test):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a docstring to the new visitor method so all newly added functions include documentation. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
if _uses_supports_offset(node.test):
"""Track guarded branches when an `if` condition checks `supports_offset`."""
Why it matters? 🤔

This is a newly introduced method without a docstring, so the suggestion is aligned with the docstring rule. The improved code is syntactically correct and only adds documentation.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 68:68
**Comment:**
	*Custom Rule: Add a docstring to the new visitor method so all newly added functions include documentation.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

self.generic_visit(node)

def visit_Assign(self, node: ast.Assign) -> None: # noqa: N802
if _is_qry_offset_assignment(node) and self._in_guarded_if == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add a docstring to the new assignment visitor method to comply with the docstring rule for added functions. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
if _is_qry_offset_assignment(node) and self._in_guarded_if == 0:
"""Record line numbers for offset assignments that are outside guarded blocks."""
Why it matters? 🤔

The new visitor method is missing a docstring, so the suggestion correctly addresses the violation. The added string literal is valid and does not introduce any code changes beyond documentation.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/models/test_helpers_offset.py
**Line:** 79:79
**Comment:**
	*Custom Rule: Add a docstring to the new assignment visitor method to comply with the docstring rule for added functions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 21, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 057f044
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69e71e800b9f7e0008b5277c
😎 Deploy Preview https://deploy-preview-39509--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #0b865c

Actionable Suggestions - 3
  • tests/unit_tests/models/test_helpers_offset.py - 1
  • tests/unit_tests/db_engine_specs/test_elasticsearch.py - 1
  • superset/db_engine_specs/elasticsearch.py - 1
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • superset/views/datasource/utils.py - 1
Review Details
  • Files reviewed - 10 · Commit Range: 30067ad..1ec5279
    • superset/databases/schemas.py
    • superset/db_engine_specs/base.py
    • superset/db_engine_specs/elasticsearch.py
    • superset/models/helpers.py
    • superset/views/datasource/utils.py
    • tests/unit_tests/databases/test_schemas.py
    • tests/unit_tests/db_engine_specs/test_base.py
    • tests/unit_tests/db_engine_specs/test_elasticsearch.py
    • tests/unit_tests/models/test_helpers_offset.py
    • tests/unit_tests/views/datasource/test_utils.py
  • Files skipped - 1
    • UPDATING.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Black-style reformatting and trivial refactors.
"""
source = HELPERS_PATH.read_text()
assert "supports_offset" in source, (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of assert in production code

Replace assert statement with explicit error handling using pytest.fail() or raise an exception, as assert can be disabled with Python's -O flag.

Code suggestion
Check the AI-generated fix before applying
  from pathlib import Path
 +import pytest
 
  HELPERS_PATH = (
      Path(__file__).resolve().parents[3] / "superset" / "models" / "helpers.py"
 @@ -54,10 +55,10 @@
      """
      source = HELPERS_PATH.read_text()
 -    assert "supports_offset" in source, (
 +    if "supports_offset" not in source:
 +        pytest.fail(
          "helpers.py no longer references supports_offset; the OFFSET "
          "guard is gone — Elasticsearch drill-to-detail will crash on page 2+."
 -    )
 +        )

Code Review Run #0b865c


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Comment on lines +316 to +317
@pytest.mark.parametrize(
"sql_in,expected_query",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong parametrize argument type

Change the first argument to pytest.mark.parametrize on line 317 from a string to a tuple. Use ("sql_in", "expected_query") instead of "sql_in,expected_query".

Code suggestion
Check the AI-generated fix before applying
Suggested change
@pytest.mark.parametrize(
"sql_in,expected_query",
("sql_in", "expected_query"),

Code Review Run #0b865c


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

headers=json_headers,
body={"cursor": cursor},
)
except Exception: # pylint: disable=broad-except
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blind exception catch should be specific

Replace the broad Exception catch on line 121 with a specific exception type (e.g., ConnectionError, TimeoutError, or RequestException) to avoid masking unexpected errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
except Exception: # pylint: disable=broad-except
except (ConnectionError, TimeoutError, Exception): # pylint: disable=broad-except

Code Review Run #0b865c


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drill by pagination does not work with Elasticsearch

1 participant