Details
-
Task
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
None
-
None
Description
Currently Coin's gcvms process is peaking at 60GB RAM usage because it stores too many task thrift files in memory.
According to the docstring in the file gcvms.go here is what it does:
Usage: coin gcvms [--dry-run]
Goes through all tier2 disk images and deletes images that do not meet the following criteria:
- The image has been used** within 4 days unless they belong to the last successful build of any module.
- The image has been used** within 90 days.
(**An image is considered used if new VMs have been instantiated from it.)
I had a chat with tosaario while trying to figure out if this can be improved, and we came up with a better algorithm while keeping the same functionality. Putting the notes here in raw form in order to not lose them, this is intended only as very generic pseudo-code, feel free to edit to correct things.
New gcvms algorithm
- Read all tier2 images, either from disk or from opennebula.
Store the template names as keys in a map vmTemplates with values a struct with fields:- lastTimeUsed: timestamp
- Initialize a map for saving: latestSuccessfulTask[project][branch]
- Iterate i through all tasks.
- If task i is successful and more recent than latestSuccessfulTask[project][branch] then replace that value in the map
- Iterate j through all VM templates of each task.
- if the VM template is in your vmTemplates map (if not, it means it's too old - already GC'd), then:
- if date last used is after lastTimeUsed then update lastTimeUsed in vmTemplates[ task[i].template[j] ]
- if the VM template is in your vmTemplates map (if not, it means it's too old - already GC'd), then:
- Go through the vmTemplates map and delete the ones not fulfilling your criteria.
Attachments
Gerrit Reviews
For Gerrit Dashboard: COIN-1080 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
501845,1 | Refactor GCVMs | master | qtqa/tqtc-coin-ci | Status: NEW | 0 | 0 |