Skip to content

Fix failing pgupgrade if names are too long#1649

Open
Kajot-dev wants to merge 2 commits into
percona:mainfrom
Kajot-dev:fix/failing-pg-upgrade
Open

Fix failing pgupgrade if names are too long#1649
Kajot-dev wants to merge 2 commits into
percona:mainfrom
Kajot-dev:fix/failing-pg-upgrade

Conversation

@Kajot-dev

Copy link
Copy Markdown
Contributor

CHANGE DESCRIPTION

Problem:
When the pgupgrade name/cluster name is too long the generated job names for pgupgrade will exceed 63 characters and upgrade will silently fail (visible in the logs, but not visible by the user, which sees the upgrade as indefinietely progressing)

Solution:
We need to trim the name (and preserve uniqueness) to

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

Signed-off-by: jjaruszewski <jjaruszewski@man.poznan.pl>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses pgupgrade Jobs silently failing when generated Kubernetes Job names exceed the DNS-1123 label limit (63 chars), by introducing DNS-safe naming helpers and switching pgupgrade Jobs to use a truncated/unique name approach.

Changes:

  • Added SafeDNSName / SafeDNSUniqueName helpers in internal/naming/dns.go to enforce the 63-character DNS label limit.
  • Updated pgupgrade and remove-data Job name generation to use the new DNS-safe unique naming helper.
  • Updated pgupgrade reconciliation to locate the upgrade Job by labels (instead of by deterministic name), and added a unit test for long upgrade names.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
internal/naming/dns.go Introduces helpers to truncate DNS label names and generate “unique” names under the 63-char limit.
internal/controller/pgupgrade/pgupgrade_controller.go Changes upgrade Job lookup logic to search Jobs by role label.
internal/controller/pgupgrade/jobs.go Switches pgupgrade/remove-data Job naming to use SafeDNSUniqueName.
internal/controller/pgupgrade/jobs_test.go Adds a unit test validating long-name Job generation stays within DNS limits.

Comment thread internal/controller/pgupgrade/pgupgrade_controller.go
Comment thread internal/naming/dns.go
Comment thread internal/controller/pgupgrade/jobs.go
Comment thread internal/controller/pgupgrade/jobs.go
Signed-off-by: jjaruszewski <jjaruszewski@man.poznan.pl>
Copilot AI review requested due to automatic review settings June 22, 2026 16:13
@Kajot-dev Kajot-dev force-pushed the fix/failing-pg-upgrade branch from f50582a to 57ffb08 Compare June 22, 2026 16:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread internal/naming/dns.go
Comment on lines +39 to +61
// SafeDNSUniqueName ensures the name fits within the 63-character DNS-1123 label limit.
// If the name exceeds the limit, it truncates to 58 characters and appends a 4-character
// deterministic suffix based on the input name to maintain consistency across reconciles.
// It also ensures the name doesn't end with a hyphen, which is invalid for DNS labels.
// This is useful for resources that need unique names like Jobs or Pods.
func SafeDNSUniqueName(name string) string {
if len(name) <= maxDNSSafeLength {
return name
}

// Reserve 5 characters for the dash + 4 char suffix
prefix := name[:maxDNSSafeLength-5]
// Strip trailing hyphens from the truncated prefix
prefix = strings.TrimRight(prefix, "-")

// Use a deterministic suffix based on the full name (not random!)
// This ensures the same name always produces the same output across reconciles
hash := fnv.New32a()
hash.Write([]byte(name))
suffix := rand.SafeEncodeString(fmt.Sprint(hash.Sum32()))[:4]

return prefix + "-" + suffix
}

// Verify the job name fits within DNS limits and has the correct format
assert.Assert(t, len(longJob.Name) <= 63, "job name %q exceeds 63 characters", longJob.Name)
assert.Assert(t, len(longJob.Name) == 63, "truncated job name %q should be exactly 63 characters", longJob.Name)
@JNKPercona

Copy link
Copy Markdown
Collaborator
Test Name Result Time
backup-enable-disable passed 00:15:24
builtin-extensions passed 00:06:10
cert-manager-tls passed 00:10:01
custom-envs passed 00:20:15
custom-tls passed 00:08:22
database-init-sql passed 00:03:34
demand-backup failure 00:25:44
demand-backup-offline-snapshot passed 00:14:12
dynamic-configuration passed 00:04:47
finalizers passed 00:03:43
init-deploy passed 00:03:26
huge-pages passed 00:02:57
major-upgrade-14-to-15 passed 00:13:43
major-upgrade-15-to-16 passed 00:11:53
major-upgrade-16-to-17 passed 00:10:30
major-upgrade-17-to-18 passed 00:11:10
ldap passed 00:06:01
ldap-tls passed 00:06:53
monitoring passed 00:08:20
monitoring-pmm3 passed 00:09:27
one-pod passed 00:06:16
operator-self-healing passed 00:11:34
pitr passed 00:12:03
scaling passed 00:05:31
scheduled-backup passed 00:32:10
self-healing passed 00:09:11
sidecars passed 00:02:59
standby-pgbackrest passed 00:19:35
standby-streaming passed 00:13:13
start-from-backup passed 00:11:43
tablespaces passed 00:07:06
telemetry-transfer passed 00:04:40
upgrade-consistency passed 00:08:57
upgrade-minor passed 00:06:50
users passed 00:04:44
Summary Value
Tests Run 35/35
Job Duration 02:00:04
Total Test Time 05:53:23

commit: 57ffb08
image: perconalab/percona-postgresql-operator:PR-1649-57ffb0868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants