Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-82

Harmattan: recover from BIFH maintenance break during build

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Not Evaluated
    • 2011q2
    • 2011q1
    • Test scripts
    • None
    • a978295e9

    Description

      Harmattan test infrastructure may go down for maintenance in the middle of a build.

      Currently our test scripts attempt to handle this in a simplistic way which doesn't really work right.

      For example, from http://pulse.test.qt.nokia.com:8080/browse/projects/Multimedia/builds/946/details/maemo6%20bifh/ - http://pulse.test.qt.nokia.com:8080/file/artifacts/115527186/output.txt :

      (bifhcli dispatches request)
      Polling BIFH for status updates on request 607976...
      The request's progress can be observed at (...)
      current
      current
      current
      current
      current
      current
      current
      current
      current
      Will try again in 256 seconds
      Will try again in 512 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      Will try again in 1024 seconds
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      current
      (... and later:)
      BIFH request failed (state <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      <Fault 1: "<type 'exceptions.Exception'>:Maintenance break.">
      current)
      

      What happened here is that our bifhcli wrapper, while polling for status updates, received a maintenance break exception at some point.
      It handled this by simply retrying repeatedly, until eventually the maintenance break was over and the state of "current" was returned. However the "maintenance break" text apparently went to stdout, and hence was incorrectly considered part of the "state" description.

      Our bifhcli wrapper seems to behave wrong here, but fixing that would be insufficient: in any case the request had effectively been destroyed (it stalled forever and did not complete) due to the maintenance break.

      I suggest that, instead of doing this kind of "dumb" retry when a maintenance break occurs, we always cancel and restart the entire request. A maintenance break seems to imply "undefined" behavior for any requests in progress when the break occurs, so doing a total restart is the safest option.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sunil.thaha Sunil Thaha
              rmcgover Rohan McGovern (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes