Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-104737

Consider refactoring QDoc's Generator classes

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • P2: Important
    • None
    • None
    • Build tools: qdoc
    • None
    • Da Vinci 60

    Description

      When QDoc generates some documentation in its "-generate" mode, it
      internally delegates the job to a series of inheritance-bases
      constructs called "Generators".

      "Generators" takes care of a series of concerns, from generating and
      writing the output files themselves, copying files (such as images) to
      the output directory, handling errors and performing some sanity
      checks and so on, generally handling the whole pipeline of procedures
      that are required to produce the output documentation.

      From a code perspective, a base `Generator` class provides a template
      interface for a series of subclasses that implement a specific output
      format, such as `HtmlGenerator` and `DocBookGenerator`.

      In practice, this turned out to be a mess.

      The structure of the generators fix format-specific concerns for an element (e.g
      How do I write a codeblock in HTML?) together with finding the
      necessary information to produce that element (e.g Traversing the
      atom representation in the documentation to find what needs to be
      generated).

      While there are good reasons for this kind of approach; mixing the
      output requirements in the middle of the structural logic provides for
      a lower memory footprint and a smaller-scope for block-specific pieces
      of information (albeit this part is not always achieved by the current
      codebase due to the shared, mutable, possibly unnecessarily global
      state approach that most of QDoc uses), the required entanglement
      produced a codebase lacking in abstraction where each format-specific
      generator repeats the structural processing of information while
      customizing the final output to the documentation files.

      Furthermore, possibly due to the natural evolution of QDoc, those
      problems are further aggravated, with generators containing code
      duplicated from other generators but never synchronized again, the
      conflating of generation and the production of the data required for
      the same (e.g merging collections ad-hoc in the generation phase
      itself instead of providing an immutable source of truth) such that
      the concerns of previous phases leak into the generators, diverging
      layouts for similar information presented in different places and
      half-implemented abstractions that are used in some places but not
      others, fragmenting not only the structural processing of the
      generators but the produced output for an atomic-unit in a single
      format.

      From a maintainability view, this has been a pain point; slight
      variations of structural logic or format-specific concerns makes it so
      a solution might not be portable 1-on-1 between formatting generators
      (e.g the recent addition of `FileResolver` required slightly different
      patches for `HtmlGenerator` and `DocBookGenerator`), the diverging
      code-paths make it unfeasible at this point to ensure that there is a
      synchronization between the formats or that the same edge-cases are
      handled in all of them.

      Furthermore, while QDoc officially supports at least HTML and DocBook,
      we only use HTML-outputs in Qt, such that not all changes are applied
      equally to other formats, as they would not be required to complete a
      task.

      The extensibility story is similarly grim.
      Again, functionality must be replicated separately linearly in the
      number of formats.
      On the other hand, adding a new format would be similarly problematic
      as the structural logic has to be replicated together with the part
      that would actually change, the format-specific concerns for the final
      output.

      Additionally to those concerns, the monolithic view of the "generation
      phase", as included in the generators, has proven it difficult to
      produce sensible testing for smaller parts of the codebase, allowing only
      for a regression testing suite that partially covers specific cases of
      a whole QDoc execution
      .
      This is further deteriorated by the general approach that generators
      use, being complex entities with dubiously scoped mutable state
      depending on quite complex QDoc-generated structures whose format is
      not always clear.

      To address some of the concerns with the current structure, with the
      objective of progressively simplifying the maintenance of the
      generation phase of QDoc, a way to refactor this later stage of a QDoc execution
      should be explored, keeping in mind that a full-on rewrite is
      inadvisable considering the current complexity of the code and the
      fact that we don't have an automated way to ascertain full
      compatibility between changes.
      https://pandoc.org/lua-filters.html#lua-type-reference
      The following approach is proposed.

      • An intermediate representation is introduced for the layout of the
        information intended for the final output of QDoc.
        • A study-case for a similar representation can be found in
          Pandoc's types abstraction
          which has shown itself as a robust abstraction of documents in
          a multi-format context.
      • For each relevant QDoc format, a converter is introduced that
        produces a format-specific representation of the new intermediate
        representation.
      • QDoc's current layout structures, which are inline and implicit, are
        converted into the new intermediate representation.
      • Each generator delegates all format-specific concerns to the newly
        introduced converters, concerning itself only with destructuring the
        required data to produce the intermediate representation and with
        leading the order of the generation phase.

      In practice this is intended to be introduced with the following code structure:

      • A new directory, "output" is introduced at the root of QDoc's source code.
        The directory is intended to contain all subsystems related to
        producing the final documentation.
        • A subdirectory "primitives" is introduced under "output".
          The directory is intended to contain all representation of `primitive` objects.
          • `primitive` objects are slim POD types that represent the
            AST of the newly introduced intermediate representation.
        • A subdirectory "formats" is introduced under "output".
          The directory is intended to contain all `formatter` objects.
          • `formatter` objects are initially-stateless slim objects
            respecting an interface that are concerned with
            converting the newly introduced intermediate
            representation to a format-specific representation.
        • A subdirectory "components" is introduced under "ouput".
          The directory contains all `components`.
          • `components` are slim function-objects, naked functions
            or similar constructs that converts QDoc's
            data-structures to the new intermediate representation,
            bridging the gap between the generators and the
            `formatters`.
      • A `formatter` object is passed to each generator as a dependency at
        construction time.

      Then, a series of iterative changes are applied using the following procedure:

      • A unit of duplicated code between the generators subclasses that
        produces a block of output is identified.
      • The currently available `primitive` objects are examined to
        understand if they can faithfully represent the block of output
        produced by the unit of code.
        • If this is not the case, a series of new `primitive` objects are
          introduced that are able to faithfully represent the block of
          output produced by the unit of code.
          • If any `primitive` is introduced, each `formatter` object is
            extended to provide a format-native representation of the
            `primitive`.
      • One or more `components` are introduced, destructuring any
        required information into a `primitive` objects representation that
        faithfully represents the block of output produced by the unit of
        code.
      • A method is introduced in the base class for `Generators`.
        The method performs three steps:
        • It handles any prerequisites for the consistency of the production of the output.
          For example, copying image files to the output directory.
        • It produces a `primitive` representation through the use of a `component`.
        • It outputs the result of passing the produced `primitive`
          representation to the `formatter` dependency to any relevant
          output file.
      • The original unit of duplicated code is replaced in each generator
        with a back-call to the newly introduced method in the base class
        for generators.

      The iterative application of this process is intended to slowly move
      bigger units of code under composed `components`, slowly replacing
      most of the code in the generator subclasses.

      Due to the original structure of the generators, any block of code
      that produces a complete output can be replaced without any semantic
      change with the above methodology.

      This ensures that the refactoring can be introduced gradually,
      destructing the output from the inside-out at different granularities
      while slowly centralizing the structural representation of a
      generation phase into the base class.

      This same centralization normalizes the difference between the
      generators, removing one of the original pain-points.

      The responsibility split between `components` and `formatters` ensures
      that we do not incur into the same duplication end extensibility
      constraint of the original code.

      Furthermore, the more granular approach to components ensures that the
      generators processing is divided into smaller units, which allows an
      easier understanding of the whole process.

      To simplify the avoidance of breakages, `components` can generally
      replicate the available code in the generators with regards to
      information processing.

      At the same time, the decentralization of the processing logic allows
      the expression of the required data at the api-boundaries, reducing
      the amount of information that the developer needs to keep in mind in
      examing the procedure. This limits the damages of the amount of
      global-mutable-state that QDoc uses, albeit only partially.

      The division of concerns and the more granular units are intended to
      further provide an introduction for generation-phase tests.
      `formatters` are intended to be fully testable, formalizing the output
      that QDoc produces.
      That is, they provide a followable code-path to questions of the form "How
      does my codeblock look like in the HTML output?".

      `components` are expected to be partially testable as long as a
      huge initial investment in test-level generators for QDoc's
      data structures is exercised.

      The generators are expected to still remain untestable as a unit in
      this phase.

      Further concerns during the iterative refactoring should be the following:

      • Revise the consistency between similar output-layouts.
        Similar information in the current output might be represented quite differently.
        For example, the inherited members for qml and cpp elements.
      • Bubble up possibly erroring side-effecting operation as much as possible.
        For example, the copy of images to the output directory cannot be
        handled effectively as it is performed under a series of
        chain-calls to inner methods and in the middle of the production
        of the image-referring output, resulting in some inconsistencies
        between the HTML representation of a missing image if the
        operation silently fails.
      • Revise the copious branching that is present in code that is
        shared between layouts.
        For example, the various branches in `generateRequisites`.
        Much of the branching could be moved outside the logic through
        case analysis, such as to simplify the actual processing, in
        exchange for some possible code-duplication.

      The expected result is one that has centralized most of the code into
      the single base class for generators, with subclasses mostly
      containing back-calls to this subclass, allowing the removal of the
      subclasses themselves.

      Concerns for this change are the introduction of a series of overhead
      computation when destructing or converting data structures.

      Furthermore, an expected increase in memory footprint.

      Albeit this will need to be revised, it is expected that the gain in
      code structures will outweigh the negatives.

      The changes are additionally intended as propedeutic to some of the currently considered
      long-terms goal for QDoc:

      • Being able to sensibly parallelize/make concurrent the production of
        output files during the generation phase.
        This is intended to be explored as a consequence of the refactoring,
        if it successful, to regain some of the performance lost to the
        introduced overhead.
      • Being able to describe some of QDoc's operation through a
        well-defined interface that can later be exposed to allow for
        customization.
      • Being able to have less monolithic phases that each produce
        serializable intermediate representation that can be used to
        introduce incremental compilation.

      The exploration of this task has been approved by QDoc's maintainer treinio.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              diseraluca Luca Di Sera
              diseraluca Luca Di Sera
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes