Details
-
Task
-
Resolution: Unresolved
-
P1: Critical
-
None
-
None
-
None
Description
Summary
Improve Qt5 integration reliability by addressing CI/test system flakiness and automating restaging.
Description
Integrating Qt5 changes is currently very difficult due to the flakiness of the CI and test systems. A full Qt5 build involves ~3000 work items, and a single failure causes the entire integration to fail. Given the current instability, it's nearly impossible to achieve a successful integration in a single run.
A common workaround is to restage the same set of changes, which reuses artifacts from the previous run and reduces integration time. However, if another integration is staged in between, the artifacts may be lost, forcing a full rebuild.
Manual mitigation steps
- Cancel unwanted integrations
- Restage the exact same set of changes multiple times
Proposed automated solutions
- Implement Coin internal restage for failed work items
- Implement Coin internal restage for failed integrations
- Introduce a queue system to preserve artifact continuity
- Explore other automation strategies
Note: "Coin internal" restage means the integration is not marked as failed in Gerrit before retries, preventing other integrations from interfering.
Rerun conditions
To avoid introducing more flakiness, reruns should only be triggered if the failure is not related to actual test failures. Acceptable failure types for reruns include:
- Provisioning failures
- Failed to acquire VM
- Timeout due to "no output received in 15 min"
- Network-related failures
Attachments
Issue Links
- resulted from
-
QTQAINFRA-7173 Not possible to integrate qt5.git changes due to flaky failures
-
- Closed
-