Skip to content

Add ignore_slots support to Patroni DCS configuration#3021

Open
PavelZaytsev wants to merge 7 commits into
zalando:masterfrom
PavelZaytsev:add-ignore-slots-dcs-config
Open

Add ignore_slots support to Patroni DCS configuration#3021
PavelZaytsev wants to merge 7 commits into
zalando:masterfrom
PavelZaytsev:add-ignore-slots-dcs-config

Conversation

@PavelZaytsev
Copy link
Copy Markdown

@PavelZaytsev PavelZaytsev commented Dec 24, 2025

Motivation

PostgreSQL 17 introduced native logical replication slot synchronization, where logical slots are synced to standby servers. However, if both enabled, Patroni's logical slot failover inteferes resulting in a faulty behavior. This PR adds support for Patroni's ignore_slots DCS configuration to allow operators to exclude specific slot types (e.g., logical slots) from failover blocking.

Implementation

  • Add ignore_slots field to Patroni struct in CRD
  • Add ignore_slots to patroniDCS struct for Spilo configuration
  • Generate ignore_slots in SPILO_CONFIGURATION when specified
  • Update CRD manifest to accept ignore_slots field
  • Add unit test for ignore_slots configuration

This enables PostgreSQL 17 (and user-defined) slot synchronization support by allowing users to configure Patroni to ignore specific replication slot types (e.g., logical slots) during failover operations.

Users can now configure ignore_slots in their PostgreSQL manifest:

  patroni:
    ignore_slots:
      - type: logical

This instructs Patroni to ignore logical replication slots during failover.

Testing

  • Added unit test in k8sres_test.go for ignore_slots configuration
  • All existing tests pass
  • Manually verified in live cluster that ignore_slots appears in:
    • PostgreSQL CRD spec
    • SPILO_CONFIGURATION environment variable
    • Patroni's live configuration (patronictl show-config)

- Add ignore_slots field to Patroni struct in CRD
- Add ignore_slots to patroniDCS struct for Spilo configuration
- Generate ignore_slots in SPILO_CONFIGURATION when specified
- Update CRD manifest to accept ignore_slots field
- Add unit test for ignore_slots configuration

This enables PostgreSQL 17 native slot synchronization support by allowing
users to configure Patroni to ignore specific replication slot types (e.g.,
logical slots) during failover operations.

Users can now configure ignore_slots in their PostgreSQL manifest:

  patroni:
    ignore_slots:
      - type: logical

This instructs Patroni to ignore logical replication slots during failover,
which is essential for PostgreSQL 17's native logical slot synchronization
feature where slots are automatically synced to standbys.
@zalando-robot
Copy link
Copy Markdown

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@zalando-robot
Copy link
Copy Markdown

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@PavelZaytsev
Copy link
Copy Markdown
Author

Hi @mikkeloscar could someone from the maintainers team look into this PR when they get the chance/run CI/CD?

@zalando-robot
Copy link
Copy Markdown

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@zalando-robot
Copy link
Copy Markdown

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@PavelZaytsev
Copy link
Copy Markdown
Author

@FxKu @hughcapet @mikkeloscar can someone from the zalando team run the check-in pipeline on the PR plz

@zalando-robot
Copy link
Copy Markdown

Cannot start a pipeline due to:

No accountable user for this pipeline: no Zalando employee associated to this GitHub username

Click on pipeline status check Details link below for more information.

@baznikin
Copy link
Copy Markdown

baznikin commented Mar 24, 2026

Every cluster switchover leads to Debezium full initial snapshot.
We switched to failover slots, but after replica restarted it is trying to delete replication slot (and delete as soon as become master). We need this PR so badly

2026-03-24 11:28:48,489 INFO: no action. I am (payments-pg-0), a secondary, and following a leader (payments-pg-1)
2026-03-24 11:28:58,478 INFO: Lock owner: payments-pg-1; I am payments-pg-0
2026-03-24 11:28:58,483 ERROR: Exception when changing replication slots
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/slots.py", line 549, in sync_replication_slots
    self._drop_incorrect_slots(cluster, slots)
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/slots.py", line 347, in _drop_incorrect_slots
    active, dropped = self.drop_replication_slot(name)
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/slots.py", line 324, in drop_replication_slot
    rows = self._query(('WITH slots AS (SELECT slot_name, active'
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/slots.py", line 205, in _query
    return self._postgresql.query(sql, *params, retry=False)
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/__init__.py", line 408, in query
    return self._query(sql, *params)
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/__init__.py", line 385, in _query
    return self._connection.query(sql, *params)
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/connection.py", line 84, in query
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/connection.py", line 74, in query
    cursor.execute(sql.encode('utf-8'), params or None)
psycopg2.errors.ObjectNotInPrerequisiteState: cannot drop replication slot "debezium"
DETAIL:  This replication slot is being synchronized from the primary server.

2026-03-24 11:28:58,502 INFO: no action. I am (payments-pg-0), a secondary, and following a leader (payments-pg-1)

@ToGoBananas
Copy link
Copy Markdown

ToGoBananas commented Mar 24, 2026

@FxKu @hughcapet
It looks like serious issue with Postgres 17+ feature support. Please think about including this in the next release

@FxKu
Copy link
Copy Markdown
Member

FxKu commented Apr 29, 2026

LGTM. Only a mention in the docs is missing.

@FxKu FxKu added the minor label Apr 29, 2026
@FxKu FxKu added this to the 2.0.0 milestone Apr 29, 2026
@FxKu FxKu moved this to Open Questions in Postgres Operator Apr 29, 2026
@baznikin
Copy link
Copy Markdown

@PavelZaytsev hi! could you, please, update this PR with documentation?

@baznikin
Copy link
Copy Markdown

baznikin commented May 20, 2026

While building own version of operator, I applied patches on top of v1.15.1 tag. I tried to add patroni.ignore_slots into running cluster in order to simulate production upgrade. Turns out it failed validation. So I added following validation:

--- a/pkg/apis/acid.zalan.do/v1/crds.go
+++ b/pkg/apis/acid.zalan.do/v1/crds.go
@@ -520,6 +520,21 @@ var PostgresCRDResourceValidation = apiextv1.CustomResourceValidation{
                                                        "failsafe_mode": {
                                                                Type: "boolean",
                                                        },
+                                                       "ignore_slots": {
+                                                               Type:     "array",
+                                                               Nullable: true,
+                                                               Items: &apiextv1.JSONSchemaPropsOrArray{
+                                                                       Schema: &apiextv1.JSONSchemaProps{
+                                                                               Type: "object",
+                                                                               AdditionalProperties: &apiextv1.JSONSchemaPropsOrBool{
+                                                                                       Allows: true,
+                                                                                       Schema: &apiextv1.JSONSchemaProps{
+                                                                                               Type: "string",
+                                                                                       },
+                                                                               },
+                                                                       },
+                                                               },
+                                                       },
                                                        "initdb": {
                                                                Type: "object",
                                                                AdditionalProperties: &apiextv1.JSONSchemaPropsOrBool{

Now cluster successfully patched, but no changes was made to StatefulSet:

time="2026-05-20T16:18:45Z" level=info msg="UPDATE event has been queued" cluster-name=test-pg/pgtest pkg=controller worker=0
time="2026-05-20T16:18:45Z" level=info msg="update of the cluster started" cluster-name=test-pg/pgtest pkg=controller worker=0
time="2026-05-20T16:18:45Z" level=debug msg="-  kind: postgresql," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="-  apiVersion: acid.zalan.do/v1," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="-    resourceVersion: 10580784," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="-    generation: 3," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+    resourceVersion: 10581339," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+    generation: 4," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+      }," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+      {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        manager: kubectl-edit," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        operation: Update," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        apiVersion: acid.zalan.do/v1," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        time: 2026-05-20T16:18:45Z," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        fieldsType: FieldsV1," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        fieldsV1: {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          f:spec: {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+            f:patroni: {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+              .: {}," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+              f:ignore_slots: {}" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+            }" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          }" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        }" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="-    patroni: {}," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+    patroni: {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+      ignore_slots: [" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          database: core," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          name: debezium," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          plugin: pgoutput," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          type: logical" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        }," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          database: payments," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          name: debezium," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          plugin: pgoutput," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          type: logical" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        }," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        {" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          database: payments," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          name: my_failover_slot2," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          plugin: pgoutput," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+          type: logical" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+        }" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+      ]" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="+    }," cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=info msg="postgresql major version unchanged or smaller, no changes needed" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing master service" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing replica service" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing pgtest-config service" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing pgtest-leader endpoint" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing pgtest-config endpoint" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing pgtest-sync endpoint" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing pgtest-failover endpoint" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="volume claim for volume \"pgdata-pgtest-0\" do not require updates" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="volume claim for volume \"pgdata-pgtest-1\" do not require updates" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="volume claims have been synced successfully" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing statefulsets" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing Patroni config" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="making GET http request: http://10.244.0.34:8008/config" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="making GET http request: http://10.244.0.36:8008/config" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="making GET http request: http://10.244.0.36:8008/patroni" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="making GET http request: http://10.244.0.34:8008/patroni" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing roles" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="closing database connection" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=debug msg="syncing connection pooler (master, replica) from (false, false) to (false, false)" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=info msg="cluster version up to date. current: 170005, min desired: 170000" cluster-name=test-pg/pgtest pkg=cluster
time="2026-05-20T16:18:45Z" level=info msg="cluster has been updated" cluster-name=test-pg/pgtest pkg=controller worker=0

In STS variable SPILO_CONFIGURATION doesn't changed:

      SPILO_CONFIGURATION:          {"postgresql":{},"bootstrap":{"initdb":[{"auth-host":"md5"},{"auth-local":"trust"}],"dcs":{"postgresql":{"parameters":{"hot_standby_feedback":"on","jit":"false","sync_replication_slots":"on","wal_level":"logical"}},"failsafe_mode":true}}}

Maybe there are some changes on master branch which make patches from this PR non-working or this PR is incomplete... I am in position of lack knowledge on my side.

When I create new cluster with ignore_slots they are added into configuration:

      SPILO_CONFIGURATION:          {"postgresql":{},"bootstrap":{"initdb":[{"auth-host":"md5"},{"auth-local":"trust"}],"dcs":{"postgresql":{"parameters":{"hot_standby_feedback":"on","jit":"false","sync_replication_slots":"on","wal_level":"logical"}},"failsafe_mode":true,"ignore_slots":[{"database":"core","name":"debezium","plugin":"pgoutput","type":"logical"},{"database":"payments","name":"my_failover_slot2","plugin":"pgoutput","type":"logical"}]}}}

@baznikin
Copy link
Copy Markdown

Second missing piece of puzzle - sync code. I can't see how I can add commits to this PR..
FYI @PavelZaytsev

--- a/pkg/cluster/sync.go
+++ b/pkg/cluster/sync.go
@@ -900,6 +900,9 @@ func (c *Cluster) checkAndSetGlobalPostgreSQLConfiguration(pod *v1.Pod, effectiv
        requiresMasterRestart := false
 
        // compare effective and desired Patroni config options
+       if desiredPatroniConfig.IgnoreSlots != nil && !reflect.DeepEqual(desiredPatroniConfig.IgnoreSlots, effectivePatroniConfig.IgnoreSlots) {
+               configToSet["ignore_slots"] = desiredPatroniConfig.IgnoreSlots
+       }
        if desiredPatroniConfig.LoopWait > 0 && desiredPatroniConfig.LoopWait != effectivePatroniConfig.LoopWait {
                configToSet["loop_wait"] = desiredPatroniConfig.LoopWait
        }

@FxKu
Copy link
Copy Markdown
Member

FxKu commented May 29, 2026

@baznikin The PostgresCRDResourceValidation does not exist anymore in the latest main branch. But you're right about the missing sync code. @PavelZaytsev can you add this missing piece and cover the option in the docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Open Questions

Development

Successfully merging this pull request may close these issues.

5 participants