Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-105255

Explore switching to Libtooling for QDoc C++ parsing needs

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • P2: Important
    • None
    • None
    • Build tools: qdoc
    • None
    • Switching to Libtooling from LibClang

    Description

      QDoc parses the codebase of a c++ project, using a parser based on
      LibClang, to build a view of the code that is later used to validate
      user provided documentation, warn about missing documentation etc...

      The code for the parser is now pretty old and might require some
      revisiting in general; in particular, there should be some
      consideration for moving the code to a LibTooling (clang's C++ API)
      based parser instead of a LibClang (clang's C API) one.

      We recently had quite a few feature requests where having access to
      the C++ AST would have greatly helped in the implementation,
      simplifying it greatly and generally avoid a series of pessimizations
      that are required by the use of the less precise LibClang
      representation of C++ code.

      For example, QTBUG-101649 or QTBUG-104946 (having access to the actual
      deprecated attribute instead of LibClang's
      UnexposedAttribute+Deprecated Availability).

      We expect that the passage to LibTooling would generally simplify both
      the internal code for QDoc and our ability to extend QDoc with new
      features that are C++ related.

      The current amount of work that would be required to move from
      LibClang to LibTooling is to be considered unknown.

      No specific blocker is currently known but the ability of LibTooling to express
      the current feature of QDoc in a slim way is to be considered
      tentative until an implementation prototype is provided.

      As part of the move a series of concerns should be addressed:


      In QTBUG-78197, the parsing done with clang was by far the slowest
      part of a QDoc execution in both the -prepare and -generate phase,
      tentatively occupying as much as a 60% of the total execution time.
      Albeit not the only known hotspot, it is to be considered the most
      important one.

      This is due to a series of factors, with a major contributing factor
      being the multiple parsing phases due, in part but not only, to the
      -prepare and -generate split for QDoc and the way this is tied to the
      build system [which is expected to be addressed, at some point, in
      favor of a single execution mode that allows the user to more easily
      run QDoc without a build system dependency].

      The implementation itself is sometimes inefficient, having to traverse
      multiple nodes or do per-function relative expensive computations to
      retrieve the require data, in some cases due to the use of the C API
      itself.

      A general check of what we currently do against what we can do, the
      use of some of the possibly better utilities from LibTooling and the a
      concern for parallelization should be considered during the move.

      This has to generally be addressed and, if the move to LibTooling is
      made, should be addressed as part of it as a requirement for the move
      to be completed.


      The parsing that QDoc does is used to produce an internal, in-memory
      database of customized data-structures [mostly Nodes] that are later
      compared to the structures extracted from the provided documentation
      comments as a form of validation and data extraction.

      The internal representation of the database does not actually
      necessarily follow what would be sensible from a C++ view of the same
      information, in part due to the mixed concerns of the used data
      structures, that have sometimes proven to be a blocker to the
      implementation of C++ features.

      This currently ties those data structures to multiple parts of QDoc,
      such that they are required to have an api that is multi-purpose and
      doesn't always allow for a good abstraction, ending up as a mixed bag
      of features.

      Adding a feature that requires the extraction of a series of
      additional information from the code always require touching the
      parser itself and later coupling the additional parsing with changes
      into the internal data structures that end up propagating to other
      phases of the execution, leaking some of the concerns between phases.

      This coupling further proposes a current model of memory where access
      to much of the data ends up split between the phases, resulting in
      semi-global-mutable state that has proven complex to maintain and
      inspect in the long run.

      As part of the moving to LibTooling, some of those concerns should be
      addressed, if possible.

      By having a better abstracted AST, we could more easily separate the
      code parsing phase and the conversion to our internal data structure,
      saving a view of the code that better represented the internal
      language structures and having easier access to some elements.

      This would in turn allows us to leave the parse itself, which is one
      of the most actually complex parts of QDoc, alone when working on some
      of the new features.


      The result of the parsing is not easily observable.

      This is a result of the internal representation that is used, the lack
      of a good API for accessing that data (for example the fact that the
      general API for QDocDatabase has many ways to do almost the same
      thing without the ability for the user to garner the actual
      differences [and when to use each entrypoint] without knowing various
      implementation details), the scope of each stored element whose state
      is mutated throughout much of the codebase (for example the merging of
      `CollectionNode`s which is done ad-hoc during the generation phase),
      the general lack of documentation and the formalization of the used
      data-model and the coupling between the ability to run the parser and
      a whole QDoc execution.

      This has proven many times to slow down new development and,
      especially, the debugging of bugs caused by data during the parsing
      phase (which leaks up to the generation phase).

      While moving to LibTooling, we should consider the possibility of
      making the parser an external tool that is easily run on the
      codebase, as QDoc would, to allow for observing the produced state,
      the use of an actual database or on-file format that would allow the
      inspection of an artifact after a run (where the index files are
      currently a partial solution to the problem) and an ad-hoc API for the
      querying of the data itself that better represent the general queries
      that are performed in later phases.


      Due to the leaking of concerns and the general coupling between parts
      of the codebase it is expected that completing this task might
      require, possibly complex, changes that are adjacent to it.
      For example, changes to `DocParser` might be required as it leaks
      incorrect data to the later phases, resulting in some complications in
      different parts of the codebase.

      It is expected that it might not be possible to address all concerns,
      the use of some temporary glue code to separate new interfaces from the
      currently existing one in QDoc is expected to be required during the
      transition.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              docinfrastructure Documentation Infrastructure Team
              diseraluca Luca Di Sera
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes