Details
-
Bug
-
Resolution: Unresolved
-
P2: Important
-
None
-
unversioned
-
None
Description
The script that feeds testresults from Coin to PostgreSQL, is coin_scraper. It depends on all 3 coins, and if any of the 3 coins is not up and running, then testresults collection is paused.
This caused an almost 2 weeks outage because coin's TQTC mirror was down.
The reason this happens is because the Coin API does not tell me if the testresults tarball exists, and on which server, so the script must check all 3 locations before giving up.
More detailed logic:
For each finished workitem, it needs to fetch the testresults tarball, in order to parse the XML test logs. This happens by polling all 3 coins for a file named testresults.tar.gz/zst.
Only if all 3 servers respond with 404, can the script continue and "forget" the workitem as one missing testresults.
Ideas are welcome to avoid such outages in the future.