🌱 Bump CAPI to v1.12.2, k8s to v1.34 and controller-runtime to v0.22.5 by clebs · Pull Request #5857 · kubernetes-sigs/cluster-api-provider-aws

clebs · 2026-02-03T16:01:20Z

What type of PR is this?
/kind support

What this PR does / why we need it:
This PR bumps CAPI to v1.12.2 and with it, k8s to v1.34.3 and controller-runtime to v0.22.5.
Needed to be up to date with CAPI releases.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5834

Special notes for your reviewer:
None

TODO
This bump is affected by the following issues which needs to be addressed before we can successfully see this bump passing regular e2es:

Checklist:

Release note:

Bump CAPI to v1.12.2, k8s to v1.34 and controller-runtime to v0.22.5

k8s-ci-robot · 2026-02-03T16:01:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

richardcase

@clebs - we'll also need to update the CAPI version in other places like the e2e config files. Like this: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/test/e2e/data/e2e_conf.yaml#L30

clebs · 2026-02-04T14:49:56Z

/ok-to-test

clebs · 2026-02-04T15:09:49Z

@clebs - we'll also need to update the CAPI version in other places like the e2e config files. Like this: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/test/e2e/data/e2e_conf.yaml#L30

@richardcase I have updated CAPI and k8s versions based on previous bump PRs:

Updated e2e test CAPI version and upgrade from/to k8s versions.
Updated envtest to be in sync with controller-runtime.

PTAL!

clebs · 2026-02-05T07:32:42Z

/retest

clebs · 2026-02-05T15:36:28Z

/retest

damdo · 2026-03-06T11:04:19Z

/test pull-cluster-api-provider-aws-e2e-blocking

clebs · 2026-03-08T21:14:05Z

/test pull-cluster-api-provider-aws-e2e-blocking

damdo · 2026-03-08T21:17:29Z

Hey @clebs we'll need #5869 in first I think, before seeing pull-cluster-api-provider-aws-e2e-blocking hopefully go green

damdo · 2026-03-09T19:36:29Z

@clebs this will need rebasing, then we can retest and see

clebs · 2026-04-08T15:37:56Z

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

clebs · 2026-04-08T19:49:18Z

/test pull-cluster-api-provider-aws-e2e-eks

clebs · 2026-04-08T22:52:21Z

/retest

faiq · 2026-04-09T00:08:47Z

/test pull-cluster-api-provider-aws-e2e-eks

clebs · 2026-04-09T07:04:44Z

/retest

damdo · 2026-04-09T07:47:06Z

/test pull-cluster-api-provider-aws-e2e

damdo · 2026-04-09T19:18:06Z

/test pull-cluster-api-provider-aws-e2e

clebs · 2026-04-10T10:21:36Z

/test pull-cluster-api-provider-aws-e2e

k8s-ci-robot · 2026-04-10T14:43:02Z

@clebs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-provider-aws-e2e	`d9eaba8`	link	false	`/test pull-cluster-api-provider-aws-e2e`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

- Update cluster-api to v1.12.2 - Update kubernetes depencencies to v0.34.3: - k8s.io/api - k8s.io/apiextensions-apiserver - k8s.io/apimachinery - k8s.io/client-go - k8s.io/component-base - Update controller-runtime to v0.22.5 - Update ginkgo to v2.27.2 - Update gomega to v1.38.2 - Regenerate base CRDs Signed-off-by: Borja Clemente <bclement@redhat.com>

Signed-off-by: Borja Clemente <bclement@redhat.com>

Signed-off-by: Borja Clemente <borja.clemente@gmail.com>

As we update to kubernetes 1.33+ and introduce the AMI AL2023 templates, we need to use the NodeadmConfig instead of EKSConfig for bootstrapping. Signed-off-by: Borja Clemente <borja.clemente@gmail.com>

* test: fix mmp tests * fix(e2e): remove check to use old secret generation method because templates always use nodeadm now * fix: remove field cloud init field for machinepool * fix(e2e): adds al2023 explictly for AMI type

* feat: convert default eks templates to use nodeadm by default * test: removes redundant nodeadm test

Background diagnostic goroutines (resource dump every 5s, machine dump every 60s) call upstream CAPI framework functions (DumpAllResources, DumpMachines) that use Gomega Expect() assertions internally. When a cluster is being deleted, these assertions can fail with "not found" errors. Since these dumps are purely informational, their failures should not mark the test spec as failed or cause panics. The previous fix used InterceptGomegaFailure() to catch these failures, but that function is not goroutine-safe: it temporarily replaces the global Gomega fail handler, which can race with assertions in the test's main goroutine. This caused [PANICKED] failures when Eventually() timeouts in the test goroutine hit the intercepted handler, which panics with "stop execution" instead of calling ginkgo.Fail(). Replace InterceptGomegaFailure with a goroutine-aware custom fail handler registered via RegisterFailHandler. The handler uses a sync.Map to track diagnostic goroutine IDs. When a Gomega assertion fails: - In a diagnostic goroutine: panic with a sentinel value (without ever calling ginkgo.Fail), caught by a per-call recover() which logs a warning. - In any other goroutine: delegate to ginkgo.Fail normally. This ensures diagnostic dump failures are silently absorbed without affecting global state or racing with test assertions. Ref: https://kubernetes.slack.com/archives/CD6U2V71N/p1758795545213209

…duling During self-hosted e2e tests, the CAPA controller pod can be evicted from a control plane node during upgrade drain and rescheduled to a worker node. The previous gcr.io/k8s-staging-cluster-api/capa-manager:e2e image reference is a local-only tag that doesn't exist on any registry, so if the kubelet garbage-collects the pre-loaded image, the pod enters ImagePullBackOff permanently. Fix this by dynamically rewriting the CAPA provider component image replacement to use the ECR Public URL (where the image is already pushed by ensureTestImageUploaded). With imagePullPolicy: IfNotPresent, the kubelet uses the local cache when available and falls back to pulling from ECR Public if needed. The ECR-tagged image is also added to the Kind bootstrap images list for consistency.

…it race Lower the healthy threshold from 5 to 2 and the health check interval from 10s to 5s, reducing the time for the target group to mark the API server as healthy from 50s to 10s. This leaves a comfortable 50s margin within kubeadm's default 60s kubernetesAPICallTimeout, preventing the upload-config/kubeadm phase from hitting a context deadline when the ELB has not yet started forwarding traffic. Also increase the unhealthy threshold from 3 to 6 to compensate for the shorter interval and avoid flapping during transient hiccups.

Add a new `DNSResolutionCheck` field to AWSLoadBalancerSpec that allows users to control whether the provider verifies DNS resolution of the API server LB before marking the load balancer as ready and setting the control plane endpoint. By default (when unset), the DNS lookup is performed So this check is now made by default as it is necessary to have a fully ready and routable load balancer before proceeding further with the cluster bootstrapping. This has now become a strict prerequisite for CAPI clusters being installed and bootrapped via the CAPI KubeAdm bootstrap provider, given the recent changes to make this a stricter requirement that have been introduced by kubeadm (see: kubernetes/kubeadm#3294) Setting the field to "None" skips the check entirely, which is useful in environments with no kubeadm requirements, slow DNS propagation, custom resolvers, or private hosted zones where the controller node may not be able to resolve the ELB's FQDN despite it being valid.

…ration Introduce PendingInstanceRequeue (1s) to replace DefaultReconcilerRequeue (30s) for the pending-instance polling loop. This reduces the worst-case delay between an EC2 instance reaching Running state and the controller registering it with the API server NLB target group. AWS NLB target registration has an internal propagation delay of 90–180s before health checks begin and traffic is routed. With the previous 30s requeue, the total time from instance creation to NLB readiness left insufficient margin for kubeadm's 60-second upload-config timeout. Measured CI timelines showed failures missing the deadline by only 2–24s; polling every 1s instead of 30s recovers ~29s of that margin, shifting most clusters from failure to success.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/support Categorizes issue or PR as a support question. needs-priority cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 3, 2026

k8s-ci-robot requested review from richardcase and serngawy February 3, 2026 16:01

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 3, 2026

richardcase requested changes Feb 3, 2026

View reviewed changes

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2026

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 4, 2026

clebs force-pushed the capi-1.12-bump branch 2 times, most recently from b6a0781 to b06184d Compare February 4, 2026 20:42

clebs mentioned this pull request Feb 10, 2026

🌱 Remove cloud-provider flag usage #5865

Merged

5 tasks

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 2, 2026

clebs force-pushed the capi-1.12-bump branch from 3f8b452 to 16f9bf6 Compare March 4, 2026 09:35

k8s-ci-robot removed the do-not-merge/contains-merge-commits label Mar 4, 2026

clebs force-pushed the capi-1.12-bump branch from 16f9bf6 to da13c71 Compare March 4, 2026 09:38

clebs force-pushed the capi-1.12-bump branch from da13c71 to 5f7a840 Compare March 8, 2026 21:12

clebs force-pushed the capi-1.12-bump branch from 09780fb to 84d74a9 Compare April 8, 2026 14:38

clebs force-pushed the capi-1.12-bump branch from 84d74a9 to d9eaba8 Compare April 8, 2026 19:48

clebs and others added 18 commits April 22, 2026 11:46

deps: update e2e testing versions and envtest

1360173

Signed-off-by: Borja Clemente <bclement@redhat.com>

fix(e2e): use correct AMI k8s version

e270d74

Signed-off-by: Borja Clemente <bclement@redhat.com>

test(e2e): Add v1.12 to CAPI version to contract map

4846c68

Signed-off-by: Borja Clemente <bclement@redhat.com>

test(e2e): update EKS lookup on templates

6f08806

Signed-off-by: Borja Clemente <borja.clemente@gmail.com>

test(e2e): Migrate bootstrap provider to Nodeadm

8c3ce81

As we update to kubernetes 1.33+ and introduce the AMI AL2023 templates, we need to use the NodeadmConfig instead of EKSConfig for bootstrapping. Signed-off-by: Borja Clemente <borja.clemente@gmail.com>

fix: adds insecure skip secrets manager required for al2023 (#9)

23c9497

test: fix mmp tests (#10)

11c1ee3

* test: fix mmp tests * fix(e2e): remove check to use old secret generation method because templates always use nodeadm now * fix: remove field cloud init field for machinepool * fix(e2e): adds al2023 explictly for AMI type

fix: removes redundant nodeadm templates (#11)

64be80c

* feat: convert default eks templates to use nodeadm by default * test: removes redundant nodeadm test

fix: flatcar log collection

226ccfd

fix: improve SpotMarketOptions comparison

149ba99

chore: bump calico to v3.31.4 (to support k8s 1.34)

d1c1e53

fix(e2e): use external cloud provider (#12)

9effe3a

clebs force-pushed the capi-1.12-bump branch from d9eaba8 to 9effe3a Compare April 22, 2026 09:46

Conversation

clebs commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Feb 3, 2026

Uh oh!

richardcase left a comment

Choose a reason for hiding this comment

Uh oh!

clebs commented Feb 4, 2026

Uh oh!

clebs commented Feb 4, 2026

Uh oh!

clebs commented Feb 5, 2026

Uh oh!

clebs commented Feb 5, 2026

Uh oh!

damdo commented Mar 6, 2026

Uh oh!

clebs commented Mar 8, 2026

Uh oh!

damdo commented Mar 8, 2026

Uh oh!

damdo commented Mar 9, 2026

Uh oh!

clebs commented Apr 8, 2026

Uh oh!

clebs commented Apr 8, 2026

Uh oh!

clebs commented Apr 8, 2026

Uh oh!

faiq commented Apr 9, 2026

Uh oh!

clebs commented Apr 9, 2026

Uh oh!

damdo commented Apr 9, 2026

Uh oh!

damdo commented Apr 9, 2026

Uh oh!

clebs commented Apr 10, 2026

Uh oh!

k8s-ci-robot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

clebs commented Feb 3, 2026 •

edited

Loading