WIP: OCPSTRAT-3036: Rebase 1.36.2#2653
Conversation
Add Workload-Aware Preemption fields to Workload and PodGroup APIs
…-owners-typo Fix malformed OWNERS entries used by maintainers
…ader-test-race apiserver: tolerate APF header race with timeout handler in priority-and-fairness tests
…Policy is Always'
scheduler: fix race in DRA pending allocation sharing
Flaky test fix for 'should restart failing container when pod restartPolicy is Always'
Revert "Switch PLEGOnDemandRelist default to `false` for 1.36"
The kubelet status manager was not preserving the pod.status.nodeAllocatableResourceClaimStatuses field set by the scheduler during pod status merges. This caused the information to the to be destroyed by the kubelet's next status sync, making the field always appear empty. Add the same preservation pattern already used for ResourceClaimStatuses and ExtendedResourceClaimStatus to both mergePodStatus() and isPodStatusByKubeletEqual(). Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Kubelet: Add alpha-2 stage implementation for UserNamespacesHostNetworkSupport feature gate
This commit introduces the DRAResourceClaimGranularStatusAuthorization feature gate (Beta in 1.36) to enforce fine-grained authorization checks on ResourceClaim status updates. Previously, 'update' permission on 'resourceclaims/status' allowed modifying the entire status. To enforce the principle of least privilege for DRA drivers and the scheduler, this change introduces synthetic subresources and verb prefixes: - 'resourceclaims/binding': Required to update 'status.allocation' and 'status.reservedFor'. - 'resourceclaims/driver': Required to update 'status.devices'. Evaluated on a per-driver basis using 'associated-node:<verb>' (for node-local ServiceAccounts) or 'arbitrary-node:<verb>' (for cluster-wide controllers).
Fine-grained Authorization for ResourceClaim Status Updates
Fix race condition in updating the PodStatus cache
…actuated resources Signed-off-by: ndixita <ndixita@google.com>
set InPlacePodLevelResourcesVerticalScaling to false if needed
…ager Metrics The Memory Manager Metrics BeforeEach asserts that zero pods are running on the node after a kubelet config update. This hard assertion flakes when a preceding serial test's namespace deletion hasn't completed yet — framework namespace cleanup is async and the kubelet restart in updateKubeletConfig can delay in-flight pod termination. CI logs show leftover pods from MemoryQoS tests (memqos-burstable, memqos-no-limit, etc.), Probe Stress tests (50-container pods), and Summary API PSI tests (memory-pressure-pod), all still Running when the assertion fires 4-7ms after the previous test finishes. Replace the immediate Expect(count).To(BeZero()) with an Eventually poll (2 minute timeout, 5 second interval) that gives pods time to drain after the kubelet restart. The existing printAllPodsOnNode diagnostic output is preserved inside the poll for debugging. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Pod events fix
…rics-cleanup e2e_node: wait for pod drain before asserting zero pods in Memory Manager Metrics
Fix flakiness in integration test for TopologyAwareScheduling with Basic Policy
…urce-test Deflake TestPodSubresourceAuth by waiting for effective permissions before testing
Signed-off-by: Alay Patel <alayp@nvidia.com>
…di-spec kep-5304: bump cdi spec version to 0.5.0
…deAllocatableResourceClaimStatuses kubelet: do not destroy nodeAllocatableResourceClaimStatuses
Signed-off-by: yashsingh74 <yashsingh1774@gmail.com>
Update CNI plugins to v1.9.1
|
/retest |
|
/test images |
1 similar comment
|
/test images |
|
/retest |
|
/payload 5.0 nightly informing |
|
@jubittajohn: trigger 69 job(s) of type informing for the nightly release of OCP 5.0
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/29943c40-68a1-11f1-9dca-595e17569219-0 |
|
/test all |
|
/testwith openshift/kubernetes/master/k8s-e2e-gcp-ovn openshift/release#80600 |
|
/testwith openshift/kubernetes/master/e2e-aws-ovn-cgroupsv2 openshift/release#80600 |
|
/testwith openshift/kubernetes/master/e2e-aws-ovn-techpreview openshift/release#80600 |
To be squashed with the following commit later:"UPSTREAM: <carry>: Add OpenShift tooling, images, configs and docs" Signed-off-by: jubittajohn <jujohn@redhat.com>
…er_manager_linux_test.go Squash into: UPSTREAM: <carry>: disable load balancing on created cgroups when managed is enabled
…s in flagz_test.go and statusz_test.go
Squash into: UPSTREAM: <carry>: apiserver: add system_client=kube-{apiserver,cm,s} to apiserver_request_total
…acheGC is enabled Squash into UPSTREAM: <carry>: create termination events
Squash into: UPSTREAM: <carry>: add management support to kubelet
Signed-off-by: jubittajohn <jujohn@redhat.com>
… driver when not enabled The upstream csi-hostpath-plugin.yaml manifest now includes a csi-snapshot-metadata sidecar container and volume (added in k/k#130918). Upstream PR k/k#137057 added conditional stripping of these when CapSnapshotMetadata is not enabled, but only for the upstream hostpathCSIDriver. The OpenShift-specific groupSnapshotHostpathCSIDriver was never updated, causing the driver pod to fail with "secret csi-snapshot-metadata-server-certs not found" and all csi-hostpath-groupsnapshot tests to fail in techpreview jobs. Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
Signed-off-by: jubittajohn <jujohn@redhat.com>
|
@jubittajohn: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Documentation
Chores