Implement incremental filtering for `dlt.Relation` by burnash · Pull Request #3889 · dlt-hub/dlt

burnash · 2026-04-23T14:00:23Z

cloudflare-workers-and-pages · 2026-04-23T14:07:13Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`95d3ef4`	Commit Preview URL Branch Preview URL	May 08 2026, 11:44 AM

….select()

Co-authored-by: Copilot <copilot@github.com>

…llable cursor

burnash · 2026-04-30T12:58:22Z

+    @property
+    def is_incremental(self) -> bool:
+        """True if any clause on this relation was produced by `.incremental()`."""
+        # TODO: leaks True on aggregate relations because the inner subquery still


The issue (#3750) recommends storing this as a flag in sqlglot meta, but this imo has downsides: now there's no single source of truth and _incremental_ctx also should survive transformations. So I'd drop the flag.

rudolfix

this looks pretty good! 99% of use cases will be datetime cursors (this include _dlt_load_id which we'll join to _dlt_loads to use inserted_at as cursor - that's why we need this autojoin)

it makes sense to reuse parts of to_sqlglot_filter - it got tested across many destinations, which surfaced sqlglot problems with datetime literal

taking sqlglot Column and type is a good idea - better then to_sqlglot_filter
lower, upper = incremental.resolve_bounds(apply_lag=apply_lag) - this gets correct range from incremental - also from ad hoc instances (not only inside resource) ie. without end_value set. (I added this exactly due to this problem)
code that formats datetime literals depending on destination caps should be ported
there are unit test: you'll probably have your own
end to end tests in test_read_interfaces are worth preserving - they test datetime literals on all destinations - was not easy to make it work on all of them

OK to drop this meta tag for now... let's see when we need it.
we may still want to expose something similar to to_sqlglot_filter from _incremental module - for users that want to generate expression themselves and pass them to where)
_parse_incremental_cursor_path this will be pretty useful. let's sync on those max/min values - datetime values will come mostly from external schedulers but we should still support "standalone" incrementals

rudolfix · 2026-05-03T21:30:52Z

+def test_incremental_dotted_cursor_runtime_columns_base_only(
+    incremental_dataset: dlt.Dataset,
+) -> None:
+    inc: dlt.sources.incremental[Any] = dlt.sources.incremental(


very cool! this will require end to end test (processing subsequent package ids)

… check

…ncremental

rudolfix

ModalIncremental looks good - see the other PR for more details on that.
I've found two places where incomplete columns are allowed (raw read of columns from table schema). the other is old one on autojoin (dlt/dataset/_join.py:316). I included two tests that surface those

at this point we IMO need at least one smoke test for Incremental working on all destinations - there's still no end to end test that runs pipeline a few times and observes right data being selected via incremental

rudolfix · 2026-05-08T10:32:51Z

+    """
+
+    def __call__(self, relation: TDataItem) -> Tuple[Optional[TDataItem], bool, bool]:
+        ctx = getattr(relation, "_incremental_ctx", None)


please check dlthub review - here we should apply incremental automatically if not yet applied

rudolfix · 2026-05-08T10:46:39Z

+    list inline so the join qualifier resolves.
+    """
+    if ctx.incremental.end_value is None and (
+        base_query.args.get("limit") is not None or base_query.args.get("order") is not None


what is wrong with "ORDER"? order alone returns all rows and is acceptable

rudolfix · 2026-05-08T10:54:57Z

+                `last_value_func` is not `min` or `max`.
+
+        Notes:
+            Aggregate (GROUP BY) cursors with `range_start="open"`: late


This is a rare edge case - worth describing - but maybe in our docs. Relation docs are not yet updated - it belongs there

the case described here is intentional range_start="open" is explicit setting and it means: ignore late arriving data at the boundary. so this is intentional correct behavior

rudolfix · 2026-05-08T11:01:02Z

+            )
+            return rel
+
+        if not self._table_name:


you could easily extract part below into a helper function and reuse incremental application logic in

if table_name is None:

path, incremental building like:

sqlglot_type = _sqlglot_type_for_column(columns, column_name) _maybe_warn_on_cursor_missing_raise(incremental, columns, column_name) condition = _build_incremental_condition(incremental, column_ref, sqlglot_type) rel = self.__copy__() rel._sqlglot_expression = query.where(condition) if condition is not None else query rel._incremental_ctx = _RelationIncrementalContext( incremental=incremental, cursor_column=column_ref.copy(), ) return rel

is identical for both

rudolfix · 2026-05-08T11:12:29Z

+            this=sge.to_identifier(column_name, quoted=True),
+            table=sge.to_identifier(target_qualifier, quoted=False),
+        )
+        target_columns = self._dataset.schema.tables[table_name].get("columns", {})


materialized tables have columns

this expression also returns incomplete/not materialized columns.

use get_table_columns

having such columns is a byproduct of letting our users to define columns partially (ie. to say column with this name should be null, without setting the data type). maybe we should cleanup schemas when they are used in Relation class... but that's not something we can do now

rudolfix · 2026-05-08T11:14:09Z

        """Gets transform implementation that handles particular data item type"""
+        # Lazy import to avoid failure with a partially-initialised
+        # `dlt.extract` during dlt startup.
+        # TODO: we should consider creating a registry for transforms


I agree. we are reworking item detection in other pull requests. there's should be universal dispatched function (well - it is there but not used) that will return item type from which right incremental type could be derived

Implement incremental filtering for relations

e29e7cc

burnash self-assigned this Apr 23, 2026

burnash added the enhancement New feature or request label Apr 23, 2026

burnash marked this pull request as draft April 23, 2026 14:00

burnash and others added 8 commits April 28, 2026 08:53

Merge branch 'devel' into feat/3750-relation-incremental

c751400

fix on_cursor_value_missing="include" + tests

97ff02e

add clear error when dotted incremental cursor follows .from_loads()/…

1ed54b1

….select()

dont's strip order with limit

2d6ea64

Merge branch 'devel' into feat/3750-relation-incremental

baf104f

rename the arg

aa96ff4

Co-authored-by: Copilot <copilot@github.com>

emit IS NOT NULL for "raise" in on_cursor_value_missing, warn on nu…

413a280

…llable cursor

parse cursor validation errors, reject double incremental

ce2ed0d

burnash marked this pull request as ready for review April 30, 2026 12:42

burnash changed the title ~~[WIP] Implement incremental filtering for dlt.Relation~~ Implement incremental filtering for dlt.Relation Apr 30, 2026

burnash requested a review from rudolfix April 30, 2026 12:43

refactor cursor path parsing in _parse_incremental_cursor_path

9f972ea

burnash commented Apr 30, 2026

View reviewed changes

reject limit and order by on stateful incremental aggregate

dc8230d

rudolfix reviewed May 3, 2026

View reviewed changes

burnash added 6 commits May 4, 2026 14:44

drop _INCREMENTAL_META_KEY, switch is_incremental to _incremental_ctx…

6d84c0a

… check

delegate incremental state advancement to ModelIncremental

b0648ee

Merge remote-tracking branch 'origin/devel' into feat/3750-relation-i…

5321c89

…ncremental

add tests for ModelIncremental

c83bfd9

rewrite test_simple_incremental as a model-incremental smoke test

5d03400

make incremental aggregate handling bare and qualified cursors

14b9bdb

burnash requested a review from rudolfix May 7, 2026 16:20

test cases for incomplete column in projections

95d3ef4

rudolfix requested changes May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement incremental filtering for `dlt.Relation`#3889

Implement incremental filtering for `dlt.Relation`#3889
burnash wants to merge 18 commits intodevelfrom
feat/3750-relation-incremental

burnash commented Apr 23, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

burnash Apr 30, 2026

Uh oh!

rudolfix left a comment

Uh oh!

rudolfix May 3, 2026

Uh oh!

rudolfix left a comment

Uh oh!

rudolfix May 8, 2026

Uh oh!

rudolfix May 8, 2026

Uh oh!

rudolfix May 8, 2026

Uh oh!

rudolfix May 8, 2026

Uh oh!

rudolfix May 8, 2026

Uh oh!

rudolfix May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

burnash commented Apr 23, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Apr 23, 2026 •

edited

Loading