Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 5b2b259 | Commit Preview URL Branch Preview URL |
Apr 23 2026, 04:07 PM |
| table_name: Optional[str], | ||
| quote: bool = True, | ||
| casefold: bool = True, | ||
| dataset_name: Optional[str] = None, |
There was a problem hiding this comment.
I noticed you allow to pass catalogs into sqlglot schema qualifications. so maybe also allow catalog overwrite here as well?
There was a problem hiding this comment.
could you elaborate on the use of catalog here? correct me if I'm wrong: from how I see it create_sqlglot_schema does {dataset_name: {table: columns}} so catalog is always empty.
There was a problem hiding this comment.
This function was intended to generate fully qualified table names using dataset name and catalog name on self. now we can also use it to use arbitrary dataset name. so why not catalog name as well? bind_query may need it but you are right - there's no such case now. I see this only for filesystem case where we have many duckdb attached to represent foreign datasets. and those are visible as catalogs that needs to be added during bind_query.
There was a problem hiding this comment.
tldr;> ignore this comment for now
| f"'{target_dataset.dataset_name}' vs '{self._dataset.dataset_name}'" | ||
| ) | ||
| # cross-dataset filesystem not supported | ||
| if isinstance(self.sql_client, WithSchemas): |
There was a problem hiding this comment.
this is a good followup ticket. if we use duckdb ATTACH we will be able to join dataset on separate physical locations. ie. joining lance and HF tables will be possible
|
|
||
| # physical destination check | ||
| if target_dataset is not self._dataset: | ||
| if not self._dataset.is_same_physical_destination(target_dataset): |
There was a problem hiding this comment.
our physical destination check is currently half implemented:
#3758
currently FYI but we can add this to our estimations
| kind=kind, | ||
| ) | ||
| else: | ||
| if target_dataset is not self._dataset: |
There was a problem hiding this comment.
hmmm this line basically answers if dataset is foreign or not. I'd probably create is_foreign_dataset in Dataset that answers that given the "other" dataset.
- if foreign - we do what you do here
- if not - we add schemas from "other" to local schemas in "self" - those that are not present in "self"
now when dataset is foreign:
- different dataset name
- same name but different physical location - but those can't be joined :) so I think we are lucky with name comparison
(we could also compare catalogs if destination client supports them - but that IMO can be shifted to the moment we deal with filesystem foreign joins)
There was a problem hiding this comment.
Yes, the identity check is wrong here. dataset name is better but if won't it produce false negative for case insensitive destinations?
There was a problem hiding this comment.
maybe something like (destination_fingerprint, effective_dataset_name) in the spirit of #3758
There was a problem hiding this comment.
I'd just compare dataset names for now. _resolve_join_target already checks if physical locations are same (#3758 addresses joinability explicitly so this check will be even more sound). this covers IMO all practical cases.
once #3758 is implemented we will be able to:
- can_join_with to check if we can join at all
- location() + dataset_name for the identity check
0c6c278 to
390d40a
Compare
Closes #3747