Details
-
Suggestion
-
Resolution: Done
-
P2: Important
-
None
-
None
-
None
Description
Currently CI system is monitored by live TV screen in CI team's work space. That's good for checking the state but it requires human interaction.
Coin status should be monitored with automatic tool that shows overall "OK/NOT"-status and automatically alerts CI personnel. At least following should be monitored
- Disk usages of all the machines in CI
- Coin master
- Compellent
- Coin hosts
- ...
- Status of coin hosts (see also
QTQAINFRA-1749)- Do they respond at all
- Temperatures
- Status of test server (may not be needed after QTPM-242 is done)
- Status of external network connection
- OS repositories
- DNS status (
QTBUG-66311)
- Agent maximum memory usages
- Prevent random build failures by OOM
QTQAINFRA-1765
- Detect agent failures
- Failure to start test
- Hang while loading sources
- Detect zombie machines
- See coin-secrets/onesetup/list_zombies.sh, at least zombies are causing MAC address conflicts
- ...
This would prevent sudden CI problems by giving warning signs before builds actually start failing.
Attachments
Issue Links
- relates to
-
QTQAINFRA-3267 Disk Utilisation Alerts
-
- Withdrawn
-