Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-5601

KVM hosts sometimes become I/O blocked and cause VM failures

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P2: Important
    • None
    • production
    • None

    Description

      Occasionally there are a bunch of failures in the CI, VMs have I/O errors, permission errors, freezes etc. This appears to be caused by hosts momentarily becoming I/O stuck for some reason. 

      When the issue happens hosts log some NFS: __nfs4_reclaim_open_state: Lock reclaim failed! 
      It can be seen that there are issues with NFS 
      https://inframetrics.intra.qt.io/d/h1lbJxcWz/detailed-host-data-for-performance-debugging?orgId=1&from=1685007032885&to=1685007251511&var-host=ace-fawn&var-interval=10s&var-ret_policy=autogen&var-net_interface=All&var-perops=WRITE
      Another timestamp:
      https://inframetrics.intra.qt.io/d/h1lbJxcWz/detailed-host-data-for-performance-debugging?orgId=1&from=1684831180825&to=1684831987823 

      Also that something becomes blocked
      https://inframetrics.intra.qt.io/d/nOAsINNZz/telegraf-hosts?orgId=1&var-server=ace-fawn&var-inter=1s&from=1685007013920&to=1685007279879&viewPanel=28239 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tosaraja Tony Sarajärvi
            tosaario Toni Saario
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes