Details
-
Task
-
Resolution: Fixed
-
P2: Important
-
None
-
None
-
None
Description
I can hear the database sometimes "thrashing" the disk, which leads to slowness. In order to debug why this is happening, I need monitoring metrics:
- How much of shared_buffers are tables/indices
- How much of those are which table/index entity specifically
I expect a particular index to grow too much and throw everything else out, when this thrashing happens.
Additional metrics that would be nice:
- How much of the read/write IO happens for each table/index entity.
In other words, I want telegraf to send several numbers for each and every table and index.
TODO
- Create a postgresql user with only monitoring rights, use him in Telegraf
- Write queries that do the monitoring (see comments below), run them regularly in Telegraf
- Create grafana dashboards for these new metrics