Details
-
Suggestion
-
Resolution: Done
-
P2: Important
-
None
-
unversioned
-
None
Description
See QTQAINFRA-1754 for description of the problem we currently face.
Whenever a host goes into an ERROR state in ONE, the current virtual machines there will continue "running" until they time out and the build fails. If Coin knew that the host had died entirely, it could restart the builds that were on the VMs that were on the host automatically, instead of failing the build.
Definition of done for this ticket: Instead of seeing 5h timeouts on Coin when VM host crashes, the workitems running on that are restarted
Attachments
Issue Links
- relates to
-
COIN-157 Ideas how to make CI more reliable
- Closed
-
QTQAINFRA-1754 VM host that has crashed should be automatically rebooted
- Closed
-
QTQAINFRA-1778 VM host crash reasons should be automatically categorized
- Closed
- mentioned in
-
Page Loading...