Details
-
Task
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
None
-
None
Description
test_datatags table has two columns, that associate a datatag with a numeric ID:
CREATE TABLE IF NOT EXISTS test_datatags ( test_datatag_n integer PRIMARY KEY NOT NULL GENERATED BY DEFAULT AS IDENTITY, test_datatag text UNIQUE NOT NULL );
Such tables help normalize the schema (avoid repeating the same string on other tables, only the numeric id) and are not supposed to grow much.
In the past we had the problem of too many completely unique datatags (for example memory locations or random numbers) that don't match between different test runs, that have blown up this frequently accessed table to many million rows.
This has mostly been resolved on dev with targetted test fixes. Only few remain.
But every 5.15 integration keeps adding thousands of new datatags, and this slows things down.
How to resolve?
Possibly break this table into one "archive" table and a "current" one? The unique values are rarely accessed and could be moved to the archival table.
- So I need an offline procedure to search for datatags unique in test_runs_raw, produced only from old workitems from old tasks.
- Then move those to a "datatags_archive" table.
- Will need to break the foreign-key relationship test_runs_raw->test_datatags since datatags will be missing. I do not want to have a foreign-key relationship to the archive table, because the archive table would become the bottleneck then.
- Change any JOIN to test_datatag_n to LEFT JOIN in the views that read test_runs_raw.
Another strategy is to add another "count" column and only keep values with count>1.
This needs a bit of thinking because the UNIQUE index covering the whole table is necessary to do the INSERT in a performant way. And that index is 1GB for now.
- How can I mark it to not blow up postgres' shared_buffers?
- Maybe move it to the compressed drive so that it's easier for the kernel to cache?
- If I remove the UNIQUE index will the INSERTION be performant? Will the uniqueness be guaranteed by the application logic? TODO test on dev server.