Details
-
Bug
-
Resolution: Done
-
Not Evaluated
-
2011q1
-
None
-
a978295e9
Description
Harmattan test infrastructure may go down for maintenance in the middle of a build.
Currently our test scripts attempt to handle this in a simplistic way which doesn't really work right.
For example, from http://pulse.test.qt.nokia.com:8080/browse/projects/Multimedia/builds/946/details/maemo6%20bifh/ - http://pulse.test.qt.nokia.com:8080/file/artifacts/115527186/output.txt :
(bifhcli dispatches request) Polling BIFH for status updates on request 607976... The request's progress can be observed at (...) current current current current current current current current current Will try again in 256 seconds Will try again in 512 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds Will try again in 1024 seconds <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> current (... and later:) BIFH request failed (state <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> <Fault 1: "<type 'exceptions.Exception'>:Maintenance break."> current)
What happened here is that our bifhcli wrapper, while polling for status updates, received a maintenance break exception at some point.
It handled this by simply retrying repeatedly, until eventually the maintenance break was over and the state of "current" was returned. However the "maintenance break" text apparently went to stdout, and hence was incorrectly considered part of the "state" description.
Our bifhcli wrapper seems to behave wrong here, but fixing that would be insufficient: in any case the request had effectively been destroyed (it stalled forever and did not complete) due to the maintenance break.
I suggest that, instead of doing this kind of "dumb" retry when a maintenance break occurs, we always cancel and restart the entire request. A maintenance break seems to imply "undefined" behavior for any requests in progress when the break occurs, so doing a total restart is the safest option.
Attachments
Issue Links
- relates to
-
QTQAINFRA-74 Harmattan: recover from BIFH maintenance break at beginning of build
- Closed