Details
-
Task
-
Resolution: Out of scope
-
P3: Somewhat important
-
None
-
6.2
-
None
Description
The blacklisting mechanism in QtTestLib has the draw-back (among others I could expand on) that it's outside the source of the test, hence invisible to developers working on the test.
Its purpose is analogous to that of the QEXPECT_FAIL() mechanism, but for use where the failure isn't reliable.
It would thus make sense to replace it with a QKNOWN_FLAKY() or QEXPECT_FLAKY() macro, embedded in code, that communicates the same information.
This would have the advantage of being applicable to exactly the check (or each check), within a test, that is known to be flaky (and the Continue vs Abort choice from QEXPECT_FAIL() can be carried over just as usefully; albeit only applied when the flakiness manifests, always continuing past the given check if it doesn't), while also making sure that those editing the test are aware of the issue, without having to look elsewhere.
The present BXPASS category would have no equivalent with use of this macro: the relevant check would be marked with either QKNOWN_FLAKY or QEXPECT_FAIL, never both, so passing would either be cheerfully accepted as the flakiness not manifesting (so not reported at all, at the time of the check) or a plain XPASS, as at present.
This would require some way of encoding, in a parameter to the macro, the information presently contained in the blacklist file: the platform, compiler and their versions, whether being run from Coin, etc.; this should not be especially difficult, a string can contain the same information, for example by joining the lines of the relevant stanza of the blacklist file with a separator, such as the vertical bar denoting "or" (since that's the semantics of the existing file: conditions on a line, separated by spaces, are combined as "and" but separate lines are joined as "or") and we can (see discussion below) have one use of the macro per line of the blacklisting stanza. The infrastructure to support this would simply adapt the code presently used to manage blacklisting; and could also support context-dependent friends of QEXPECT_FAIL and QSKIP to simplify plenty of existing code that uses these macros conditioned on exactly the sorts of things encoded in blacklisting selectors. (If we go with QKNOWN_FLAKY as the new macro, this would let the context-conditioned sibling of QEXPECT_FAIL be called QKNOWN_FAIL to match; we might also want to include a QEXPECT_FLAKY that's equivalent to QKNOWN_FLAKY with its context set to "*", to match that relationship.)
There may be tests in which the flakiness is not sufficiently localised that it is practical to pin it to a single check within the test. While this is surely evidence of a bad test, we may find it necessary to have a variant on the macro that applies to all checks, after it, in a given test, rather than only to the immediate next check (as for QEXPECT_FAIL). Likewise, although this would be even clearer evidence of a bad test, we may need some such tag for the whole test class (or possibly a variant of the QTEST_MAIN macro), perhaps by using in initTestCase() the macro that would, in a test, mark all subsequent checks within that test; this corresponds to the opening stanza of a blacklisting file, before the first test-case tag.
(This would bring us back to having a potential flaky XPASS, but this would be best reported simply as FLAKY, based on the whole-test flaky tag.)
This is probably best left until we have practical experience with the basic per-check macro, though, and our attempts to convert existing blacklists to use it.
If a check-specific test continues past a FLAKY and to a subsequent FAIL!, XPASS or SKIP, it makes sense to report the test as failing or skipping, as appropriate, as if the FLAKY had not arisen. This may complicate testlib's logging code. For a whole-test or whole-class flaky indicator, of course, a FAIL! or XPASS is already covered by the flakiness (and reported simply as FLAKY, equivalent to blacklisted tests presently being counted as blacklisted); but we may want to consider treating a skip as the test outcome, even if it arose after either of these (this would differ from present handling of a skip in a blacklisted test).
The transition would require support for both the new and the old, of course – if only because the Qt project's own tests include enough blacklisting that making the transition would be too great an upheaval to attempt all in one go.
Open questions:
- How to report the outcome of a test when tagged as known-flaky. I propose that we have a separate FLAKY tag that replaces FAIL! (and, if needed, XPASS) as the outcome if the tagged condition does fail (regardless of whether it continues or aborts); if it passes, the test continues to a subsequent outcome as usual, rather than being tagged as a "flaky pass" (cf. the present BPASS); and a subsequent XFAIL or flaky result would be handled just as if the earlier unexercised flaky hadn't been tagged.
Attachments
Issue Links
- relates to
-
QTBUG-94476 Add varargs variants of QFAIL(), QSKIP(), QVERIFY2() that take printf-type format strings and parameters for them
- Open
-
QTQAINFRA-6400 Create a policy about handling and following up on flaky tests
- In Progress