What happened:
When a Node belongs to the multiple ResourceFlavors like the following, Kueue TAS over-subscribes the Node capacities for Workloads while the Quota Reservation is correct.
# This Node belongs to the following both ResourceFlavors.
kind: Node
metadata:
labels:
example.com/machine: standard
spec:
taints:
- effects: NoSchedule
key: example.com/machine
value: standard
# The Node doesn't have example.com/instance-type taints.
...
---
kind: ResourceFlavor
...
spec:
topologyName: flat
nodeLabels:
example.com/machine: standard
nodeTaints:
- effect: NoSchedule
key: example.com/machine
value: standard
- effect: NoSchedule
key: example.com/instance-type
value: partial-reserved
---
kind: ResourceFlavor
...
spec:
topologyName: flat
nodeLabels:
example.com/machine: standard
nodeTaints:
- effect: NoSchedule
key: example.com/machine
value: standard
What you expected to happen:
There is not over subscription in TAS.
How to reproduce it (as minimally and precisely as possible):
I reproduced this problem in the following scheduler UT case.
In this scenario, the partial-reserved-pending wl should get Node x2 Topology assignment because Node x1 has already been occupied by the ondemand-admitted-a and ondemand-admitted-b wls.
However, the current partial-reserved-pending wl topology assignment will be Node x2 (oversubscription).
#10657
https://github.com/tenzen-y/kueue/blob/6523d77d8534932e91255d25c7333478103d3254/pkg/scheduler/scheduler_tas_test.go#L2391-L2569
Anything else we need to know?:
Again, the Quota Reservation is correct.
Environment:
- Kubernetes version (use
kubectl version):
- Kueue version (use
git describe --tags --dirty --always):
- Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release):
- Kernel (e.g.
uname -a):
- Install tools:
- Others:
What happened:
When a Node belongs to the multiple ResourceFlavors like the following, Kueue TAS over-subscribes the Node capacities for Workloads while the Quota Reservation is correct.
What you expected to happen:
There is not over subscription in TAS.
How to reproduce it (as minimally and precisely as possible):
I reproduced this problem in the following scheduler UT case.
In this scenario, the
partial-reserved-pendingwl should get Node x2 Topology assignment because Node x1 has already been occupied by theondemand-admitted-aandondemand-admitted-bwls.However, the current
partial-reserved-pendingwl topology assignment will be Node x2 (oversubscription).#10657
https://github.com/tenzen-y/kueue/blob/6523d77d8534932e91255d25c7333478103d3254/pkg/scheduler/scheduler_tas_test.go#L2391-L2569
Anything else we need to know?:
Again, the Quota Reservation is correct.
Environment:
kubectl version):git describe --tags --dirty --always):cat /etc/os-release):uname -a):