Uploaded image for project: 'Qt Quality Assurance Infrastructure'
  1. Qt Quality Assurance Infrastructure
  2. QTQAINFRA-6449

test_datatags table in postgres keeps growing because of 5.15 integrations inserting too many unique datatags

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • P3: Somewhat important
    • None
    • None
    • Metrics / Test Results
    • None

    Description

      test_datatags table has two columns, that associate a datatag with a numeric ID:

      CREATE TABLE IF NOT EXISTS test_datatags
      (
          test_datatag_n  integer  PRIMARY KEY  NOT NULL  GENERATED BY DEFAULT AS IDENTITY,
          test_datatag    text  UNIQUE NOT NULL
      );
      

      Such tables help normalize the schema (avoid repeating the same string on other tables, only the numeric id) and are not supposed to grow much.

      In the past we had the problem of too many completely unique datatags (for example memory locations or random numbers) that don't match between different test runs, that have blown up this frequently accessed table to many million rows.

      This has mostly been resolved on dev with targetted test fixes. Only few remain.

      But every 5.15 integration keeps adding thousands of new datatags, and this slows things down.

      How to resolve?

      Possibly break this table into one "archive" table and a "current" one? The unique values are rarely accessed and could be moved to the archival table.

      • So I need an offline procedure to search for datatags unique in test_runs_raw, produced only from old workitems from old tasks.
      • Then move those to a "datatags_archive" table.
      • Will need to break the foreign-key relationship test_runs_raw->test_datatags since datatags will be missing. I do not want to have a foreign-key relationship to the archive table, because the archive table would become the bottleneck then.
      • Change any JOIN to test_datatag_n to LEFT JOIN in the views that read test_runs_raw.

      Another strategy is to add another "count" column and only keep values with count>1.

      This needs a bit of thinking because the UNIQUE index covering the whole table is necessary to do the INSERT in a performant way. And that index is 1GB for now.

      • How can I mark it to not blow up postgres' shared_buffers?
      • Maybe move it to the compressed drive so that it's easier for the kernel to cache?
      • If I remove the UNIQUE index will the INSERTION be performant? Will the uniqueness be guaranteed by the application logic? TODO test on dev server.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jimis Dimitrios Apostolou
            jimis Dimitrios Apostolou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes