Details
-
Epic
-
Resolution: Unresolved
-
P2: Important
-
None
-
None
-
None
-
Switching to Libtooling from LibClang
Description
QDoc parses the codebase of a c++ project, using a parser based on
LibClang, to build a view of the code that is later used to validate
user provided documentation, warn about missing documentation etc...
The code for the parser is now pretty old and might require some
revisiting in general; in particular, there should be some
consideration for moving the code to a LibTooling (clang's C++ API)
based parser instead of a LibClang (clang's C API) one.
We recently had quite a few feature requests where having access to
the C++ AST would have greatly helped in the implementation,
simplifying it greatly and generally avoid a series of pessimizations
that are required by the use of the less precise LibClang
representation of C++ code.
For example, QTBUG-101649 or QTBUG-104946 (having access to the actual
deprecated attribute instead of LibClang's
UnexposedAttribute+Deprecated Availability).
We expect that the passage to LibTooling would generally simplify both
the internal code for QDoc and our ability to extend QDoc with new
features that are C++ related.
The current amount of work that would be required to move from
LibClang to LibTooling is to be considered unknown.
No specific blocker is currently known but the ability of LibTooling to express
the current feature of QDoc in a slim way is to be considered
tentative until an implementation prototype is provided.
As part of the move a series of concerns should be addressed:
In QTBUG-78197, the parsing done with clang was by far the slowest
part of a QDoc execution in both the -prepare and -generate phase,
tentatively occupying as much as a 60% of the total execution time.
Albeit not the only known hotspot, it is to be considered the most
important one.
This is due to a series of factors, with a major contributing factor
being the multiple parsing phases due, in part but not only, to the
-prepare and -generate split for QDoc and the way this is tied to the
build system [which is expected to be addressed, at some point, in
favor of a single execution mode that allows the user to more easily
run QDoc without a build system dependency].
The implementation itself is sometimes inefficient, having to traverse
multiple nodes or do per-function relative expensive computations to
retrieve the require data, in some cases due to the use of the C API
itself.
A general check of what we currently do against what we can do, the
use of some of the possibly better utilities from LibTooling and the a
concern for parallelization should be considered during the move.
This has to generally be addressed and, if the move to LibTooling is
made, should be addressed as part of it as a requirement for the move
to be completed.
The parsing that QDoc does is used to produce an internal, in-memory
database of customized data-structures [mostly Nodes] that are later
compared to the structures extracted from the provided documentation
comments as a form of validation and data extraction.
The internal representation of the database does not actually
necessarily follow what would be sensible from a C++ view of the same
information, in part due to the mixed concerns of the used data
structures, that have sometimes proven to be a blocker to the
implementation of C++ features.
This currently ties those data structures to multiple parts of QDoc,
such that they are required to have an api that is multi-purpose and
doesn't always allow for a good abstraction, ending up as a mixed bag
of features.
Adding a feature that requires the extraction of a series of
additional information from the code always require touching the
parser itself and later coupling the additional parsing with changes
into the internal data structures that end up propagating to other
phases of the execution, leaking some of the concerns between phases.
This coupling further proposes a current model of memory where access
to much of the data ends up split between the phases, resulting in
semi-global-mutable state that has proven complex to maintain and
inspect in the long run.
As part of the moving to LibTooling, some of those concerns should be
addressed, if possible.
By having a better abstracted AST, we could more easily separate the
code parsing phase and the conversion to our internal data structure,
saving a view of the code that better represented the internal
language structures and having easier access to some elements.
This would in turn allows us to leave the parse itself, which is one
of the most actually complex parts of QDoc, alone when working on some
of the new features.
The result of the parsing is not easily observable.
This is a result of the internal representation that is used, the lack
of a good API for accessing that data (for example the fact that the
general API for QDocDatabase has many ways to do almost the same
thing without the ability for the user to garner the actual
differences [and when to use each entrypoint] without knowing various
implementation details), the scope of each stored element whose state
is mutated throughout much of the codebase (for example the merging of
`CollectionNode`s which is done ad-hoc during the generation phase),
the general lack of documentation and the formalization of the used
data-model and the coupling between the ability to run the parser and
a whole QDoc execution.
This has proven many times to slow down new development and,
especially, the debugging of bugs caused by data during the parsing
phase (which leaks up to the generation phase).
While moving to LibTooling, we should consider the possibility of
making the parser an external tool that is easily run on the
codebase, as QDoc would, to allow for observing the produced state,
the use of an actual database or on-file format that would allow the
inspection of an artifact after a run (where the index files are
currently a partial solution to the problem) and an ad-hoc API for the
querying of the data itself that better represent the general queries
that are performed in later phases.
Due to the leaking of concerns and the general coupling between parts
of the codebase it is expected that completing this task might
require, possibly complex, changes that are adjacent to it.
For example, changes to `DocParser` might be required as it leaks
incorrect data to the later phases, resulting in some complications in
different parts of the codebase.
It is expected that it might not be possible to address all concerns,
the use of some temporary glue code to separate new interfaces from the
currently existing one in QDoc is expected to be required during the
transition.
Attachments
Issue Links
- blocks
-
QTBUG-104946 Get API deprecation info from C++ macros
- Blocked