Skip to content

Dispatch and discriminate REPACK/CLUSTER (T_RepackStmt) on PG19 (#8613)#8624

Open
ihalatci wants to merge 1 commit into
pg19-supportfrom
ihalatci-pg19-repack-cluster-dispatch
Open

Dispatch and discriminate REPACK/CLUSTER (T_RepackStmt) on PG19 (#8613)#8624
ihalatci wants to merge 1 commit into
pg19-supportfrom
ihalatci-pg19-repack-cluster-dispatch

Conversation

@ihalatci

Copy link
Copy Markdown
Contributor

Part of #8597 (PostgreSQL 19 support).

Addresses #8613 — dispatch and discriminate the PG19 unified REPACK/CLUSTER command (T_RepackStmt).

Note: the base branch of this PR is pg19-support (the PG19 integration branch), not main, so GitHub will not auto-close #8613 on merge — it is tracked manually.

What & why

PG19 removed ClusterStmt/T_ClusterStmt and replaced them with the unified RepackStmt, which backs three commands distinguished by RepackStmt.command: CLUSTER, the new REPACK, and VACUUM FULL. (VACUUM FULL still dispatches through T_VacuumStmt, so it never reaches this path.) The foundation PR (#8601) added a compile-only shim aliasing T_ClusterStmtT_RepackStmt; that makes the names compile but is not correct behaviour, because CLUSTER and REPACK now collide on a single node tag and must be told apart.

This PR makes Citus dispatch and propagate the command correctly on PG19, while staying byte-for-byte regression-neutral on PG17/PG18.

Approach (Option A)

Citus propagates REPACK exactly like CLUSTER: the command is shipped to every shard placement through the existing CLUSTER code path. The worker name-relay (RelayEventExtendNamesAppendShardIdToName) mutates the parse tree in place, appending the shardId to both the target relation name and the index name (when USING INDEX is given), then ProcessUtilityParseTree executes the mutated node — there is no deparse-to-string step, so the existing relabel localizes REPACK's names with no new deparser work. CLUSTER vs REPACK are discriminated by RepackStmt.command, and the user-facing WARNING/ERROR wording is command-aware.

Files

  • src/include/pg_version_compat.h — RepackStmt shim + version-portable helpers ClusterStmtIsRepack() / ClusterStmtCommandName() (compiled out on < PG19).
  • src/backend/distributed/commands/cluster.cPreprocessClusterStmt is command-aware (correct REPACK vs CLUSTER wording in the no-relation WARNING, the partitioned WARNING, and the VERBOSE ERROR).
  • src/backend/distributed/commands/distribute_object_ops.cGetDistributeObjectOps routes both CLUSTER and REPACK (shared T_ClusterStmt/T_RepackStmt tag).
  • src/backend/distributed/relay/relay_event_utility.c — name-relay reads the RepackStmt layout (relation + index name).
  • src/test/regress/sql/pg19.sql, expected/pg19.out, expected/pg19_0.out — new PG19-only acceptance test (+ the < PG19 early-quit variant).
  • src/test/regress/multi_1_create_citus_schedule — registers the pg19 test.

Acceptance test (PG19-only, pg19.sql)

The test distributes repack_test (4 shards, replication factor 1) and demonstrates (not merely asserts) that distributed REPACK works:

  • REPACK <t> USING INDEX <idx>, bare REPACK <t>, and CLUSTER <t> USING <idx> each rewrite every shard placement — the relfilenode changes on all shards (verified via run_command_on_shards).
  • Distribution-invariance after the rewrites — proves REPACK changes storage, not distribution semantics:
    • placements_unchanged — shard set + placement nodes identical before/after (symmetric EXCEPT).
    • shard_count_unchanged — shard count identical before/after.
    • shard_row_mapping_unchanged — per-shard count(*) identical before/after (no row crossed a shard boundary).
    • distribution_unchanged(partmethod, partkey, colocationid) intact.
    • routing still works — router queries a = 42 → (42, 2) and a = 100 → (100, 0); cross-shard aggregate count = 100, sum = 5050.
  • Command-aware messaging — VERBOSE → REPACK/CLUSTER-worded ERROR; partitioned distributed table → not propagating REPACK command for partitioned table to worker nodes WARNING.

The < PG19 early-quit path keeps PG17/PG18 on the unchanged pg19_0.out, so this test is invisible to older majors.

Validation gate (WSL, pgenv, full -Werror, no demotion)

Version Build (-Werror) Regress
PG17.10 clean 4/4, no diffs (pg19 early-\q → unchanged pg19_0.out)
PG18.4 clean 4/4, no diffs (pg19 early-\q → unchanged pg19_0.out)
PG19beta1 clean 4/4 — distributed-REPACK acceptance test green (all invariance assertions t)

CLUSTER's existing tests/expected (multi_index_statements, pg15) are unchanged on all versions.

@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (pg19-support@1ef844c). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff               @@
##             pg19-support    #8624   +/-   ##
===============================================
  Coverage                ?   88.95%           
===============================================
  Files                   ?      288           
  Lines                   ?    64392           
  Branches                ?     8096           
===============================================
  Hits                    ?    57278           
  Misses                  ?     4780           
  Partials                ?     2334           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ihalatci ihalatci force-pushed the ihalatci-pg19-repack-cluster-dispatch branch from 32b630f to d98bdb9 Compare June 29, 2026 10:41
PG19 replaced ClusterStmt with the unified RepackStmt, which backs both
the legacy CLUSTER command and the new REPACK command under a single node
tag (T_RepackStmt). The foundation shim in pg_version_compat.h aliases the
old type/tag names so existing Citus code keeps compiling; this change makes
the command handling correct rather than merely compiling.

Citus now discriminates CLUSTER vs REPACK via RepackStmt->command, exposed
through version-portable helpers ClusterStmtIsRepack() and
ClusterStmtCommandName() (constant CLUSTER/false fallbacks pre-PG19).
PreprocessClusterStmt propagates REPACK exactly like CLUSTER -- the original
command text is shipped to every shard placement -- and only varies the
user-facing WARNING/ERROR wording by command. The worker name-relay in
relay_event_utility.c appends the shardId to both the target relation name
and the optional USING INDEX name for either command. VACUUM FULL is
unaffected: core keeps dispatching it through T_VacuumStmt, so it never
reaches this CLUSTER/REPACK path.

Adds a PG19-only regress test (pg19) that distributes a table and proves
REPACK ... USING INDEX, bare REPACK, and CLUSTER ... USING each rewrite every
shard placement (relfilenode change), preserve row data, and that VERBOSE is
rejected with command-aware wording. PG17/18 run the early-quit variant
(pg19_0.out) and stay regression-neutral.

Validated under full -Werror on PG17.10, PG18.4, and PG19beta1 -- all green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ihalatci ihalatci force-pushed the ihalatci-pg19-repack-cluster-dispatch branch from d98bdb9 to c3f090a Compare June 29, 2026 10:58
@ihalatci ihalatci marked this pull request as ready for review July 1, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PG19: dispatch REPACK/CLUSTER (T_RepackStmt) + GetDistributeObjectOps

2 participants