Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-6172

Create grafana dashboard for random build/infra fails

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Not Evaluated
    • None
    • None
    • Metrics / Test Results
    • None

    Description

      Nightly health check builds should always pass, so any fails there are flakiness.
      To monitor this a Grafana dashboard is needed. Dashboard should display the fails on a timeline. Fail reasons should be categorized. Currently known reasons are 'sccache error' and 'malformed universal file', and rest of the fails can be categorized as 'other'.

      Modify coin log parser and create a separate dashboard to track the number of failed builds:
      Every failed built must be checked, write results to coin_extra database
      build dasbhaord

      1) Android emulator is failing to start:
      example of fail:
      https://coin.ci.qt.io/coin/api/log/qt/qtlocation/b74c2684acbc187ac0a22400752661f50039c116/LinuxRHEL_8_8x86_64AndroidAndroid_ANYx86GCCqtci-linux-RHEL-8.8-x86_64-50-a6a815AndroidTestRun_Sccache_UseConfigure_WarningsAreErrors/e06e4ea1db753a74190a2e5ee21e1223440f0375/test_1718060160/log.txt.gz

      Waiting a few minutes for the emulator to fully boot...
      agent:2024/06/10 23:15:53 build.go:404: bootanim= boot_completed= bootcomplete=
      agent:2024/06/10 23:15:53 build.go:256: Virtual Memory Total: 28150202368, Free:14611292160, UsedPercent:16.198523%
      agent:2024/06/10 23:15:53 build.go:268: | PID| PPID|Status| CPU-%|Command

      example of success:
      Emulator started successfully
      https://coin.ci.qt.io/coin/api/log/qt/qtdeclarative/0ca02952cfbe4a736803e2cf1857705e9a46a5a3/LinuxRHEL_8_8x86_64AndroidAndroid_ANYx86GCCqtci-linux-RHEL-8.8-x86_64-50-ce1770AndroidTestRun_Sccache_UseConfigure_WarningsAreErrors/84f278b0c70f8affcdb76615ee671ac1b9c12ee2/test_1718836764/log.txt.gz

      2) sccache fails:
      sccache: error: failed to execute compile agent:2023/10/31 03:31:37 build.go:404: sccache: caused by: error reading compile response from server agent:2023/10/31 03:31:37 build.go:404: sccache: caused by: Failed to read response header agent:2023/10/31 03:31:37 build.go:404: sccache: caused by: An existing connection was forcibly closed by the remote host. (os error 10054) agent:2023/10/31 03:31:37 build.go:404: ninja: build stopped: subcommand failed.

      https://coin.ci.qt.io/coin/api/log/qt/qtconnectivity/e5f1dba18036a43219e7af2a0e515c5bbcf05352/WindowsWindows_11_22H2x86_64WindowsWindows_11_22H2x86_64MSVC2019qtci-windows-11_22H2-x86_64-51-b4b5ceDebugAndRelease_Sccache_UseConfigure/b1873ea6378a64d4422123574d6f16a85cecdb05/build_1703372906/log.txt.gz

      Dashboard:
      https://testresults.qt.io/grafana/d/edpu33bwhsiyof/build-failures?orgId=1

      28 June 2024
      categorization by fail type - picking the first fail
      (we will not store precise info if in same build/test both sscashe and android emulator fail, we will categorize as either sscashe or android)

      • check sscashe must be checked on both failed build and test workitems, all branches,
      • android case: failed only tests (since emulator fails cannot be recovered)
      • categorization names: sccache_error, android_emulator_start_failure
      • run daily - 6 am Finland time, data covered - 6 am previous day- 6 am current day
      • inform Rami Potinkara once it is running stably

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            anwojcie Anna Wojciechowska
            jujokini Jukka Jokiniva
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes