BigQuery: update column descriptions on existing columns#3881
Open
vriken wants to merge 3 commits intodlt-hub:develfrom
Open
BigQuery: update column descriptions on existing columns#3881vriken wants to merge 3 commits intodlt-hub:develfrom
vriken wants to merge 3 commits intodlt-hub:develfrom
Conversation
dlt currently only applies column `description` hints when creating a table (CREATE TABLE) or adding new columns (ALTER TABLE ADD COLUMN). If descriptions are added to the schema after the table exists, they are never propagated to BigQuery. This adds a new overridable hook `_alter_existing_column_hints_sql` in SqlJobClientBase (returns [] by default) and implements it in the BigQuery client to emit ALTER TABLE ... ALTER COLUMN ... SET OPTIONS statements for columns whose descriptions have changed. This is a metadata-only change β no data is modified. Fixes dlt-hub#3879 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: - Fetch current column descriptions from BigQuery via get_table() API and only emit ALTER COLUMN SET OPTIONS when they actually differ - Handle description removal: emit SET OPTIONS(description=NULL) when a description is removed from the schema but still exists in BQ - Use get_table_columns without include_incomplete (complete columns only) - Add tests: diff skips unchanged, removal emits NULL, special char escaping - Use instance method replacement instead of unittest.mock Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
dlt applies column
descriptionhints to BigQuery only when creating a table (CREATE TABLE) or adding new columns (ALTER TABLE ADD COLUMN). If descriptions are added to the schema after the table already exists, they are never propagated to the destination on subsequent pipeline runs.This adds a new overridable hook
_alter_existing_column_hints_sqlinSqlJobClientBase(returns[]by default), called from_build_schema_update_sqlfor existing tables. The BigQuery implementation:get_table()APIALTER COLUMN SET OPTIONSwhen descriptions actually differSET OPTIONS(description=NULL)include_incomplete)This is metadata-only β no data is modified.
Note: Snowflake and Databricks have the same gap β both apply column descriptions/comments on
CREATE/ADD COLUMNonly (viaCOMMENTsyntax). The_alter_existing_column_hints_sqlhook is designed for them to override as well, but this PR only implements and tests BigQuery since that's the destination I can verify against.Related Issues
Additional Context
Files changed:
dlt/destinations/job_client_impl.pyβ base class hook + call site in_build_schema_update_sqldlt/destinations/impl/bigquery/bigquery.pyβ BigQuery diff-based implementationtests/load/bigquery/test_bigquery_table_builder.pyβ 6 unit tests (changed, unchanged, removal, new columns, escaping)