[WIP] Explicit joins in Relation.join() by burnash · Pull Request #3868 · dlt-hub/dlt

burnash · 2026-04-15T12:54:00Z

cloudflare-workers-and-pages · 2026-04-15T13:00:08Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`5b2b259`	Commit Preview URL Branch Preview URL	Apr 23 2026, 04:07 PM

rudolfix · 2026-04-16T07:01:49Z

+        table_name: Optional[str],
+        quote: bool = True,
+        casefold: bool = True,
+        dataset_name: Optional[str] = None,


I noticed you allow to pass catalogs into sqlglot schema qualifications. so maybe also allow catalog overwrite here as well?

could you elaborate on the use of catalog here? correct me if I'm wrong: from how I see it create_sqlglot_schema does {dataset_name: {table: columns}} so catalog is always empty.

This function was intended to generate fully qualified table names using dataset name and catalog name on self. now we can also use it to use arbitrary dataset name. so why not catalog name as well? bind_query may need it but you are right - there's no such case now. I see this only for filesystem case where we have many duckdb attached to represent foreign datasets. and those are visible as catalogs that needs to be added during bind_query.

tldr;> ignore this comment for now

rudolfix · 2026-04-16T07:04:08Z

+                        f"'{target_dataset.dataset_name}' vs '{self._dataset.dataset_name}'"
+                    )
+                # cross-dataset filesystem not supported
+                if isinstance(self.sql_client, WithSchemas):


this is a good followup ticket. if we use duckdb ATTACH we will be able to join dataset on separate physical locations. ie. joining lance and HF tables will be possible

rudolfix · 2026-04-16T07:11:09Z

+
+            # physical destination check
+            if target_dataset is not self._dataset:
+                if not self._dataset.is_same_physical_destination(target_dataset):


our physical destination check is currently half implemented:
#3758

currently FYI but we can add this to our estimations

rudolfix · 2026-04-16T07:19:13Z

+                kind=kind,
+            )
+        else:
+            if target_dataset is not self._dataset:


hmmm this line basically answers if dataset is foreign or not. I'd probably create is_foreign_dataset in Dataset that answers that given the "other" dataset.

if foreign - we do what you do here

if not - we add schemas from "other" to local schemas in "self" - those that are not present in "self"

now when dataset is foreign:

different dataset name

same name but different physical location - but those can't be joined :) so I think we are lucky with name comparison
(we could also compare catalogs if destination client supports them - but that IMO can be shifted to the moment we deal with filesystem foreign joins)

Yes, the identity check is wrong here. dataset name is better but if won't it produce false negative for case insensitive destinations?

maybe something like (destination_fingerprint, effective_dataset_name) in the spirit of #3758

I'd just compare dataset names for now. _resolve_join_target already checks if physical locations are same (#3758 addresses joinability explicitly so this check will be even more sound). this covers IMO all practical cases.

once #3758 is implemented we will be able to:

can_join_with to check if we can join at all

location() + dataset_name for the identity check

burnash changed the base branch from feat/3403-relation-join to devel April 15, 2026 12:55

burnash changed the base branch from devel to feat/3403-relation-join April 15, 2026 12:55

burnash changed the base branch from feat/3403-relation-join to devel April 15, 2026 15:41

rudolfix assigned burnash Apr 16, 2026

rudolfix reviewed Apr 16, 2026

View reviewed changes

explicit on and cross-dataset joins in Relation.join()

390d40a

burnash force-pushed the feat/3747-explicit-joins branch from 0c6c278 to 390d40a Compare April 16, 2026 11:38

change the dataset equality check

5b2b259

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Explicit joins in Relation.join()#3868

[WIP] Explicit joins in Relation.join()#3868
burnash wants to merge 2 commits intodevelfrom
feat/3747-explicit-joins

burnash commented Apr 15, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

rudolfix Apr 16, 2026

Uh oh!

burnash Apr 16, 2026

Uh oh!

rudolfix Apr 17, 2026

Uh oh!

rudolfix Apr 17, 2026

Uh oh!

rudolfix Apr 16, 2026

Uh oh!

rudolfix Apr 16, 2026

Uh oh!

rudolfix Apr 16, 2026

Uh oh!

burnash Apr 16, 2026

Uh oh!

burnash Apr 16, 2026

Uh oh!

rudolfix Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

burnash commented Apr 15, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Apr 15, 2026 •

edited

Loading