Loading...

XML

Word

Printable

Monitor all kinds of health statistics for all our build and test VMs. Requirements:

Install a monitoring utility to all of our Tier2 images
- Telegraf? it's the one already in use for the host machines.
- Must be able to run custom monitoring commands on custom intervals, for example "ioping" on a custom directory, in order to measure the I/O latency.
Send all statistics to a remote database
- InfluxDB most likely, as it's already used for recording the host machines metrics
- Make sure the VMs don't cache any metrics, but send them directly, as the build VMs are by definition short lived - they can be killed the moment something goes wrong, but we definitely don't want to miss those metrics
Data retention on the database is of secondary importance; it's OK to delete logs after a month or even only a week.
We'll most likely need to assign a unique hostname to each build VM in Coin.

relates to

QTQAINFRA-3089 Implement centralised log aggregation for all hosts/VMs in Coin and OpenNebula

1.	Install and start Telegraf on all of Coin's short-lived VMs	Closed	Dimitrios Apostolou
2.	Store Coin annotations in InfluxDB	Closed	Daniel Smith
3.	Create dashboards in Grafana with system metrics of VMs during build/test runs	Closed	Dimitrios Apostolou
4.	Create links from Coin's build status page to the Grafana dashboards for hypervisor hosts and VMs	Closed	Toni Saario
5.	InfluxDB is flooded with thousands of TCP connections	Closed	Dimitrios Apostolou

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

There are no open Gerrit changes

Hide There is 1 closed Gerrit change