Skip to content

[python] Support snapshot_id and tag_name in Ray read_paimon API#7802

Merged
JingsongLi merged 1 commit into
apache:masterfrom
TheR1sing3un:py-ray-snapshot-tag
May 11, 2026
Merged

[python] Support snapshot_id and tag_name in Ray read_paimon API#7802
JingsongLi merged 1 commit into
apache:masterfrom
TheR1sing3un:py-ray-snapshot-tag

Conversation

@TheR1sing3un
Copy link
Copy Markdown
Member

Purpose

Add time-travel support to the top-level pypaimon.ray.read_paimon API,
so a Ray scan can read a specific snapshot id or a named tag.

Why

Before this PR, read_paimon always read the latest snapshot — there
was no way to reproduce a scan against a fixed point in history through
the recommended public facade. Internally pypaimon already understood
scan.tag-name (added with #7243), but the matching scan.snapshot-id
plumbing was missing on the Python side even though the option exists in
Java's CoreOptions.SCAN_SNAPSHOT_ID.

What changed

Public APIpypaimon/ray/ray_paimon.py:

  • read_paimon(..., snapshot_id=None, tag_name=None)
  • Both are keyword-only and mutually exclusive (ValueError if both set)

Backing plumbing:

  • pypaimon/read/datasource/split_provider.py: CatalogSplitProvider
    takes the two new fields, applies them via table.copy({"scan.snapshot-id": ..., "scan.tag-name": ...}) in _ensure_table. Same mutual-exclusion
    guard as a defense-in-depth layer.
  • pypaimon/common/options/core_options.py: new SCAN_SNAPSHOT_ID
    config (long type, no default), aligned with Java's
    CoreOptions.SCAN_SNAPSHOT_ID.
  • pypaimon/snapshot/time_travel_util.py: try_travel_to_snapshot now
    accepts a snapshot_manager and resolves scan.snapshot-id against it.
  • pypaimon/read/table_scan.py: _create_file_scanner routes
    SCAN_SNAPSHOT_ID through snapshot_manager.get_snapshot_by_id +
    manifest_list_manager.read_all, mirroring the existing
    SCAN_TAG_NAME branch.
  • Existing TimeTravelUtil callers (full-text scan, vector search scan)
    are updated to pass the snapshot manager.

Docsdocs/content/pypaimon/ray-data.md: added a Time travel
example block and parameter docs.

Tests

  • time_travel_util_test.py (new, 6 cases): SCAN_KEYS contents,
    snapshot-id resolution, missing-id raise, missing-snapshot-manager raise,
    mutual exclusion at the util layer.
  • split_provider_test.py (+3 cases): provider-level snapshot-id /
    tag-name time travel + ctor mutual-exclusion guard.
  • ray_integration_test.py (+3 cases): read_paimon end-to-end with
    snapshot_id / tag_name, plus the public mutual-exclusion guard.

All read-path regression tests still pass (57/57 across reader-pk,
reader-append-only, projection, time-travel, ray integration).

Out of scope

  • Streaming time travel (scan.mode=from-snapshot etc.) is unchanged.
  • Java side already has CoreOptions.SCAN_SNAPSHOT_ID; no Java changes
    needed.

Lets callers time-travel a Ray scan to a specific snapshot id or a
named tag through the top-level ``read_paimon`` facade, mirroring
Java ``CoreOptions.SCAN_SNAPSHOT_ID`` / ``SCAN_TAG_NAME``. The two
arguments are mutually exclusive at both the public entry point and
the underlying ``CatalogSplitProvider``.

Also fills in the ``scan.snapshot-id`` plumbing that was missing on
the Python side: a new ``CoreOptions.SCAN_SNAPSHOT_ID`` config, a
``TimeTravelUtil`` branch resolving it via ``SnapshotManager``, and a
matching path in ``TableScan._create_file_scanner``. The two existing
callers of ``TimeTravelUtil.try_travel_to_snapshot`` (full-text scan,
vector search scan) are updated to pass the snapshot manager.

Tests: TimeTravelUtil unit tests, CatalogSplitProvider time-travel
unit tests, and Ray ``read_paimon`` integration tests covering both
snapshot-id and tag-name paths plus the mutual-exclusion guard.
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 3271b21 into apache:master May 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants