Details
-
Bug
-
Resolution: Fixed
-
P1: Critical
-
None
-
production
-
None
-
2bc191db8 (dev)
Description
We’re currently experiencing a bottleneck in the CI pipeline caused by 32 vCPU jobs waiting excessively long to acquire VMs. This leads to a situation where only a small number of VMs (~50) are running, while over 1000 jobs are queued, despite there being available capacity on the hosts.
NUMA pinning contributes to this a lot: a 32 vCPU job requires all 32 vCPUs to be available on a single physical CPU. For instance, on an 80 vCPU host with 2 physical CPUs (40 vCPUs each), 2 vCPUs are reserved for the host, leaving only 38 per physical CPU for VMs. As a result, the host must be nearly empty to accommodate a 32 vCPU job.
This constraint significantly limits scheduling flexibility and needs to be addressed to prevent CI delays and better utilize available resources.
Attachments
Issue Links
- resulted in
-
QTQAINFRA-7193 OpenNebula not scheduling VMs fast enough
-
- Closed
-