Skip to content

fix: unquote SQL identifiers in VISUALIZE column references#450

Open
cpsievert wants to merge 1 commit into
posit-dev:mainfrom
cpsievert:fix/quoted-identifier-columns
Open

fix: unquote SQL identifiers in VISUALIZE column references#450
cpsievert wants to merge 1 commit into
posit-dev:mainfrom
cpsievert:fix/quoted-identifier-columns

Conversation

@cpsievert
Copy link
Copy Markdown
Collaborator

Summary

Columns with dots (or other special characters) in their names fail when used in VISUALIZE mappings with quoted identifiers — e.g. "variable.dotted" AS y. The parser was storing the raw source text including surrounding double-quotes, so validation couldn't match them against the actual (unquoted) Arrow schema column names.

  • Promotes the private unquote helper from stat_aggregate.rs to naming::unquote_ident (inverse of quote_ident)
  • Applies it at parse time in builder.rs for both explicit and implicit column mappings

Fixes posit-dev/ggsql-python#3

Repro

duckdb reproduce.db -c "CREATE TABLE test AS SELECT 1 AS numbers, 3 AS \"variable.dotted\" UNION ALL SELECT 2, 4"
ggsql exec --reader "duckdb://reproduce.db" \
  'SELECT * FROM test VISUALIZE numbers AS x, "variable.dotted" AS y DRAW line'

Before: Validation error: Layer 1: aesthetic 'pos2' references non-existent column '"variable.dotted"'
After: renders successfully

Test plan

  • New unit tests for unquote_ident (basic cases + quote/unquote roundtrip)
  • All existing tests pass (cargo test --lib naming — 32 passed)
  • End-to-end CLI verification with dotted column name

Quoted identifiers like "variable.dotted" were stored with their
surrounding double-quotes intact, causing validation to fail when
matching against Arrow schema column names (which are unquoted).

Promotes the private unquote helper from stat_aggregate to a public
naming::unquote_ident and applies it at parse time in builder.rs.

Fixes posit-dev/ggsql-python#3
@georgestagg
Copy link
Copy Markdown
Collaborator

georgestagg commented May 14, 2026

I haven’t had a chance to look at this properly, but I wanted to drop a quick drive-by comment to note that we must be careful to continue to quote identifiers in emitted SQL queries, in order to support the snowflake engine, where unquoted identifiers are automatically converted to uppercase (I know…)

This PR might not touch that, but I just want to be sure that affected columns remain quoted in SQL queries we emit.

I think dots in identifiers is always going to be flakey here, since SQL uses dots to separate database, schema and tables. So, for example, variable.dotted (and ”variable”.”dotted”) is also grammatically valid in this slot but means something different. We should be sure our grammar and processing is handling that appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with columns that have a period in their name

2 participants