Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-1754

VM host that has crashed should be automatically rebooted

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: P2: Important P2: Important
    • None
    • unversioned
    • None

      Definition of done: When host goes down, it should automatically reboot and not require any manual steps

      Currently VM host crash is detected only by this process:

      1. VM host crashes
      2. After 5h developer sees "Timeout" in Coin
      3. Developer does restage and is happy
      4. Developer 2 sees "Timeout" in Coin
      5. Developer 2 does restage and is happy
      6. Developer 3 sees "Timeout" in Coin
      7. Developer 3 asks in IRC about the timeouts
      8. Someone from CI restarts the host

      What should happen is:

      1. VM host crashes
      2. Coin detects that the host has crashed and restarts work items that were running on it (QTQAINFRA-1749)
      3. Automatic monitoring detects the host is down and restarts the host (QTQAINFRA-1754)
      4. Root cause of the problem is diagnosed/categorized and reported to CI operators (QTQAINFRA-1778)

       Definition of done for this ticket: Automatic monitoring detects the host is down and restarts the host

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            tosaraja Tony Sarajärvi
            sanurmen Sami Nurmenniemi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:

                There are no open Gerrit changes