Uploaded image for project: 'Coin'
  1. Coin
  2. COIN-787

In some cases Coin should retry in a new VM, before giving up the whole integration



    • Suggestion
    • Resolution: Unresolved
    • P2: Important
    • None
    • None
    • Other


      If Coin can tell for sure that the build failed because of VM flakiness, then it should spawn a new VM and re-run the specific workitem just once again, before causing the whole integration to fail.

      For example see this log: 17 minutes after the workitem started, coin killed it because of 15 minutes timeout. We can generalise the rule:

      • If the agent kills the build because of timeout error within 5min+timeout since the start of the build, then spawn a new VM on a different host and re-run.

      More criteria can be added in the future. Besides saving time and resources by avoiding re-running hundreds of workitems, this will also give us an automated way to recognize flakiness because of CI factors, and not because of code. For example:

      • If the re-run workitem mentioned above succeeds, then save all details of the previously failed workitem in influx and mark it as "flaky CI". This will give us many datapoints to investigate.


        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.



              jujokini Jukka Jokiniva
              jimis Dimitrios Apostolou
              0 Vote for this issue
              2 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes