XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P1: Critical
    • None
    • production
    • None
    • 2bc191db8 (dev)

    Description

      We’re currently experiencing a bottleneck in the CI pipeline caused by 32 vCPU jobs waiting excessively long to acquire VMs. This leads to a situation where only a small number of VMs (~50) are running, while over 1000 jobs are queued, despite there being available capacity on the hosts.

      NUMA pinning contributes to this a lot: a 32 vCPU job requires all 32 vCPUs to be available on a single physical CPU. For instance, on an 80 vCPU host with 2 physical CPUs (40 vCPUs each), 2 vCPUs are reserved for the host, leaving only 38 per physical CPU for VMs. As a result, the host must be nearly empty to accommodate a 32 vCPU job.

      This constraint significantly limits scheduling flexibility and needs to be addressed to prevent CI delays and better utilize available resources.

      Attachments

        Issue Links

          For Gerrit Dashboard: QTQAINFRA-7194
          # Subject Branch Project Status CR V

          Activity

            People

              jujokini Jukka Jokiniva
              jujokini Jukka Jokiniva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There is 1 open Gerrit change