XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P1: Critical
    • None
    • production
    • None
    • 2bc191db8 (dev)

    Description

      We’re currently experiencing a bottleneck in the CI pipeline caused by 32 vCPU jobs waiting excessively long to acquire VMs. This leads to a situation where only a small number of VMs (~50) are running, while over 1000 jobs are queued, despite there being available capacity on the hosts.

      NUMA pinning contributes to this a lot: a 32 vCPU job requires all 32 vCPUs to be available on a single physical CPU. For instance, on an 80 vCPU host with 2 physical CPUs (40 vCPUs each), 2 vCPUs are reserved for the host, leaving only 38 per physical CPU for VMs. As a result, the host must be nearly empty to accommodate a 32 vCPU job.

      This constraint significantly limits scheduling flexibility and needs to be addressed to prevent CI delays and better utilize available resources.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              jujokini Jukka Jokiniva
              jujokini Jukka Jokiniva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There is 1 open Gerrit change