Skip to content
Open
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8d39567
rename upstream api package
egegunes Apr 20, 2026
bb21125
fix old api references
egegunes Apr 20, 2026
3ade8de
rename CRDs
egegunes Apr 20, 2026
c4baaf7
fix api version
egegunes Apr 20, 2026
ac6f625
remove testing dir
egegunes Apr 20, 2026
3955649
fix rbacs
egegunes Apr 20, 2026
5cee9a4
add migration controller
egegunes Apr 20, 2026
5e81eac
update manifests
egegunes Apr 21, 2026
92f6a31
fix unit tests
egegunes Apr 22, 2026
8f121cc
fix upgrade-minor
egegunes Apr 23, 2026
bb84b04
Merge branch 'main' into crd-rename
mayankshah1607 Apr 23, 2026
50cd1dd
Merge branch 'main' into crd-rename
hors Apr 23, 2026
b1b4cfd
fix typo
egegunes Apr 24, 2026
20baf17
e2e-tests: add Crunchy Data migration test suite
hors Apr 27, 2026
2c70a21
Merge branch 'main' into crd-rename
hors Apr 27, 2026
855095d
address mayank's comments
egegunes Apr 27, 2026
87e562b
remove references to testing/
egegunes Apr 27, 2026
0aefa6e
use require to assert owner reference
egegunes Apr 27, 2026
5986a05
fix issues due to review comments
egegunes Apr 27, 2026
66a29e0
address review comments
egegunes Apr 28, 2026
497af8d
Merge branch 'main' into crd-rename
egegunes Apr 28, 2026
25d5c98
Push stranded timeline history files after stanza creation
hors Apr 28, 2026
ad66ef7
fix timeouts
hors Apr 28, 2026
706bb71
fix test
hors Apr 28, 2026
28c6eb5
K8SPG-1000 Add extensions major upgrade tests
jvpasinatto Apr 8, 2026
bf40ab8
consider in docker.io prefix in get_container_image function
jvpasinatto Apr 9, 2026
1cae3be
CLOUD-727: Bump github.com/Azure/go-ntlmssp
dependabot[bot] Apr 23, 2026
6816797
address mayank's comments
egegunes Apr 27, 2026
13bebd2
remove references to testing/
egegunes Apr 27, 2026
a854cac
use require to assert owner reference
egegunes Apr 27, 2026
4cd87dd
fix issues due to review comments
egegunes Apr 27, 2026
5335d2b
address review comments
egegunes Apr 28, 2026
62f329d
CLOUD-727: Bump k8s.io/apimachinery from 0.35.4 to 0.36.0 (#1568)
dependabot[bot] Apr 28, 2026
b0328f7
fix timeouts
hors Apr 28, 2026
bc3d09a
Push stranded timeline history files after stanza creation
hors Apr 28, 2026
150b6d0
Merge branch 'crd-rename' into stanza-timing-fix
hors Apr 28, 2026
fb73327
fix tests
hors Apr 29, 2026
847ccad
Merge branch 'crd-rename' into stanza-timing-fix
hors Apr 29, 2026
f1ede33
remove migration controller
egegunes Apr 29, 2026
f15c14a
fix updating owner references of issuer & certificates
egegunes Apr 29, 2026
b71131c
add status condition for migration status
egegunes Apr 30, 2026
783a445
Merge branch 'crd-rename' into stanza-timing-fix
hors Apr 30, 2026
81b3319
Merge remote-tracking branch 'origin/main' into stanza-timing-fix
hors Jun 11, 2026
27557c3
fix file
hors Jun 11, 2026
5557553
Apply suggestion from @github-actions[bot]
hors Jun 11, 2026
5191d59
Apply suggestion from @github-actions[bot]
hors Jun 11, 2026
26ce453
fix test upgrade-minor
hors Jun 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<<<<<<<< HEAD:config/crd/bases/postgres-operator.crunchydata.com_postgresclusters.yaml
========
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
Expand Down Expand Up @@ -31127,3 +31129,4 @@ spec:
storage: true
subresources:
status: {}
>>>>>>>> origin/main:config/crd/bases/upstream.pgv2.percona.com_postgresclusters.yaml
3 changes: 3 additions & 0 deletions e2e-tests/run-pr.csv
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ telemetry-transfer
upgrade-consistency
upgrade-minor
users
migration-from-crunchy-standby
migration-from-crunchy-pv
migration-from-crunchy-backup-restore
Comment on lines +36 to +38
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 30
commands:
- script: |-
set -o errexit
kubectl get configmap -n "${NAMESPACE}" 10-second-batch-written
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: |-
set -o errexit
set -o xtrace

source ../../functions

primary=$(kubectl get pod -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/role=primary \
-o jsonpath='{.items[0].metadata.name}')

if [ -z "${primary}" ]; then
echo "ERROR: primary pod not found"
exit 1
fi

# Insert the second batch of rows that must survive the upcoming PITR.
# The PITR target is captured in step 12 AFTER the step 11 backup
# completes, so the backup's start time is guaranteed to be before the
# target and pgBackRest can use it for the restore.
kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- \
psql -d migrationtest -c "
INSERT INTO migration_data VALUES
(5, 'second-batch-one'),
(6, 'second-batch-two'),
(7, 'second-batch-three');
"

# Force a WAL switch so the inserted rows reach the archive before the
# step 11 backup starts.
kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- \
psql -q -c "SELECT pg_switch_wal();"

kubectl create configmap -n "${NAMESPACE}" 10-second-batch-written \
--from-literal=rows="5,6,7"
timeout: 120
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 560
---
kind: Job
apiVersion: batch/v1
metadata:
annotations:
postgres-operator.crunchydata.com/pgbackrest-backup: second-post-restore-backup
labels:
postgres-operator.crunchydata.com/pgbackrest-backup: manual
postgres-operator.crunchydata.com/pgbackrest-repo: repo1
ownerReferences:
- apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
controller: true
blockOwnerDeletion: true
status:
succeeded: 1
---
apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
metadata:
name: second-post-restore-backup
spec:
pgCluster: percona-restored
repoName: repo1
status:
state: Succeeded
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
metadata:
name: second-post-restore-backup
spec:
pgCluster: percona-restored
repoName: repo1
options:
- --type=full
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 600
---
apiVersion: pgv2.percona.com/v2
kind: PerconaPGRestore
metadata:
name: second-pitr-restore
status:
state: Succeeded
---
# One StatefulSet per pod; readyReplicas=1 each. Aggregate validated below.
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
postgres-operator.crunchydata.com/cluster: percona-restored
postgres-operator.crunchydata.com/data: postgres
postgres-operator.crunchydata.com/instance-set: instance1
ownerReferences:
- apiVersion: upstream.pgv2.percona.com/v1beta1
kind: PostgresCluster
name: percona-restored
controller: true
blockOwnerDeletion: true
status:
availableReplicas: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
---
apiVersion: upstream.pgv2.percona.com/v1beta1
kind: PostgresCluster
metadata:
name: percona-restored
status:
instances:
- name: instance1
readyReplicas: 3
replicas: 3
updatedReplicas: 3
pgbackrest:
restore:
finished: true
id: second-pitr-restore
succeeded: 1
---
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
name: percona-restored
status:
postgres:
instances:
- name: instance1
ready: 3
size: 3
ready: 3
size: 3
state: ready
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: |-
set -o errexit
set -o xtrace

source ../../functions

# No scale-down needed: the cluster is on TL3 and the stanza existed
# before this timeline. When postgres promotes TL3→TL4 after the PITR,
# 00000004.history is pushed by the async archiver immediately (stanza
# exists), so pg_rewind on the two replicas can trace the ancestry and
# rejoin without the scale-to-1 workaround required for step 08.

primary=$(kubectl get pod -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/role=primary \
-o jsonpath='{.items[0].metadata.name}')

if [ -z "${primary}" ]; then
echo "ERROR: primary pod not found"
exit 1
fi

# Capture the PITR target NOW — after the step 11 backup has completed.
# This guarantees the backup start time < target < after-target rows,
# which is the only valid ordering for a time-based pgBackRest restore.
pitr_target=$(kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- \
psql -q -t -c "SELECT to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS')" \
| xargs)
echo "PITR target: ${pitr_target}"

# Force a WAL switch so the LSN at pitr_target is flushed to the archive
# before we write the rows that must be absent after restore.
kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- \
psql -q -c "SELECT pg_switch_wal();"

# Write rows that must be absent after the PITR restore.
kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- \
psql -d migrationtest -c "
INSERT INTO migration_data VALUES
(8, 'after-pitr-target-one'),
(9, 'after-pitr-target-two');
"

# Resolve the latest full backup label (step 11's second-post-restore-
# backup). Because that backup completed before pitr_target was
# captured above, --set is safe: backup-start < pitr_target.
backup_label=$(kubectl -n "${NAMESPACE}" exec "${primary}" -- \
pgbackrest info --output json --log-level-console=info --stanza=db \
| jq -r '[.[] | .backup[] | select(.type == "full") | select(.database.["repo-key"] == 1)][-1].label')

if [ -z "${backup_label}" ] || [ "${backup_label}" = "null" ]; then
echo "ERROR: could not determine latest full backup label"
exit 1
fi
echo "Restoring from backup: ${backup_label}"

cat <<EOF | kubectl -n "${NAMESPACE}" apply -f -
apiVersion: pgv2.percona.com/v2
kind: PerconaPGRestore
metadata:
name: second-pitr-restore
spec:
pgCluster: percona-restored
repoName: repo1
options:
- --set=${backup_label}
- --type=time
- --target="${pitr_target}"
- --target-timeline=current
EOF
timeout: 300
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 30
commands:
- script: |-
set -o errexit

data=$(kubectl get configmap 13-restored-data -n "${NAMESPACE}" \
-o jsonpath='{.data.data}')

for expected in "row-one" "row-two" "row-three" "written-after-restore" \
"second-batch-one" "second-batch-two" "second-batch-three"; do
echo "${data}" | grep -q "${expected}" || {
echo "ERROR: '${expected}' missing after second PITR restore. Got: ${data}"
exit 1
}
done

for absent in "after-pitr-target-one" "after-pitr-target-two"; do
if echo "${data}" | grep -q "${absent}"; then
echo "ERROR: '${absent}' present after second PITR restore (should be absent). Got: ${data}"
exit 1
fi
done

echo "All expected rows present, post-target rows correctly absent"
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: |-
set -o errexit
set -o xtrace

source ../../functions

primary=$(kubectl get pod -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/role=primary \
-o jsonpath='{.items[0].metadata.name}')

if [ -z "${primary}" ]; then
echo "ERROR: primary pod not found after second PITR restore"
exit 1
fi

data=$(kubectl exec -n "${NAMESPACE}" "${primary}" -c database -- bash -c \
"psql -q -t -d migrationtest -c 'SELECT id, value FROM migration_data ORDER BY id;'")

echo "Data after second PITR restore:"
echo "${data}"

kubectl create configmap -n "${NAMESPACE}" 13-restored-data \
--from-literal=data="${data}"

# Original rows plus the second batch must all be present.
for expected in "row-one" "row-two" "row-three" "written-after-restore" \
"second-batch-one" "second-batch-two" "second-batch-three"; do
echo "${data}" | grep -q "${expected}" || {
echo "ERROR: '${expected}' missing after second PITR restore"
exit 1
}
done

# Rows written after the PITR target must be absent.
for absent in "after-pitr-target-one" "after-pitr-target-two"; do
if echo "${data}" | grep -q "${absent}"; then
echo "ERROR: '${absent}' present after second PITR restore (should be absent)"
exit 1
fi
done

echo "PASS: all expected rows present, post-target rows correctly absent"
timeout: 120
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ commands:

source ../../functions

kubectl -n ${NAMESPACE} patch postgrescluster upgrade-minor --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
kubectl -n ${NAMESPACE} delete postgrescluster upgrade-minor || true
remove_all_finalizers
check_operator_panic
destroy_operator
Expand Down
2 changes: 1 addition & 1 deletion internal/controller/postgrescluster/instance.go
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,7 @@ func (r *Reconciler) scaleUpInstances(
next := naming.GenerateInstance(cluster, set)
// if there are any available instance names (as determined by observing any PVCs for the
// instance set that are not currently associated with an instance, e.g. in the event the
// instance STS was deleted), then reuse them instead of generating a new name
// instance STS was deleted), then reuse them instead of generating a new name.
if len(availableInstanceNames) > 0 {
next.Name = availableInstanceNames[0]
availableInstanceNames = availableInstanceNames[1:]
Expand Down
15 changes: 15 additions & 0 deletions internal/controller/postgrescluster/pgbackrest.go
Original file line number Diff line number Diff line change
Expand Up @@ -3037,6 +3037,21 @@ func (r *Reconciler) reconcileStanzaCreate(ctx context.Context,
r.Recorder.Event(postgresCluster, corev1.EventTypeNormal, EventStanzasCreated,
"pgBackRest stanza creation completed successfully")

// Re-push any timeline history files stranded by the async-archiver race:
// postgres archives 00000002.history during bootstrap promotion before the
// stanza exists; pgBackRest drops it silently (error 103) and postgres
// never retries. Without it pg_rewind fails on replicas after PITR.
log := logging.FromContext(ctx)
historyOut, historyErr := pgbackrest.Executor(exec).ArchivePushHistoryFiles(ctx)
if historyErr != nil {
r.Recorder.Event(postgresCluster, corev1.EventTypeWarning,
"ArchivePushHistoryFilesFailed", historyErr.Error())
log.Error(historyErr, "timeline history file recovery failed",
"pod", writableInstanceName, "output", historyOut)
} else if historyOut != "" {
log.Info("timeline history file recovery", "output", historyOut)
}
Comment on lines +3040 to +3053

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand we need to do this after stanza is created but i wonder if we can do this in the caller of this function reconcilePGBackRest after line 1642 and if configHashMismatch is false

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egegunes Moving it to the caller isn't straightforward. reconcileStanzaCreate returns (false, nil) in three distinct cases:

Stanza just created successfully ← the only case where we want ArchivePushHistoryFiles
Stanza already created (stanzasCreated == true) ← should not run it again
Cluster not yet writable (!clusterWritable) ← exec would target the wrong pod / fail
So checking !configHashMismatch && err == nil at the caller doesn't let us distinguish case 1 from cases 2 and 3. We'd be exec-ing into a pod on every reconcile, including when the cluster isn't writable yet.

To make it work at the caller level we'd need a third return value (e.g. stanzaJustCreated bool) and also either return writableInstanceName or re-discover the writable instance in the caller, since the exec closure is built around it.

Given that reconcileStanzaCreate already has the writable instance name in scope and the call sits naturally right after successful stanza creation but up to you.


// if no errors then stanza(s) created successfully
for i := range postgresCluster.Status.PGBackRest.Repos {
postgresCluster.Status.PGBackRest.Repos[i].StanzaCreated = true
Expand Down
Loading
Loading