Loading...

Details

Type: Task
Resolution: Unresolved
Priority: P2: Important
Fix Version/s: None
Affects Version/s: None
Component/s: Build tools: qdoc
Labels:
None

Sprint:
Da Vinci 60

Description

When QDoc generates some documentation in its "-generate" mode, it
internally delegates the job to a series of inheritance-bases
constructs called "Generators".

"Generators" takes care of a series of concerns, from generating and
writing the output files themselves, copying files (such as images) to
the output directory, handling errors and performing some sanity
checks and so on, generally handling the whole pipeline of procedures
that are required to produce the output documentation.

From a code perspective, a base `Generator` class provides a template
interface for a series of subclasses that implement a specific output
format, such as `HtmlGenerator` and `DocBookGenerator`.

In practice, this turned out to be a mess.

The structure of the generators fix format-specific concerns for an element (e.g
How do I write a codeblock in HTML?) together with finding the
necessary information to produce that element (e.g Traversing the
atom representation in the documentation to find what needs to be
generated).

While there are good reasons for this kind of approach; mixing the
output requirements in the middle of the structural logic provides for
a lower memory footprint and a smaller-scope for block-specific pieces
of information (albeit this part is not always achieved by the current
codebase due to the shared, mutable, possibly unnecessarily global
state approach that most of QDoc uses), the required entanglement
produced a codebase lacking in abstraction where each format-specific
generator repeats the structural processing of information while
customizing the final output to the documentation files.

Furthermore, possibly due to the natural evolution of QDoc, those
problems are further aggravated, with generators containing code
duplicated from other generators but never synchronized again, the
conflating of generation and the production of the data required for
the same (e.g merging collections ad-hoc in the generation phase
itself instead of providing an immutable source of truth) such that
the concerns of previous phases leak into the generators, diverging
layouts for similar information presented in different places and
half-implemented abstractions that are used in some places but not
others, fragmenting not only the structural processing of the
generators but the produced output for an atomic-unit in a single
format.

From a maintainability view, this has been a pain point; slight
variations of structural logic or format-specific concerns makes it so
a solution might not be portable 1-on-1 between formatting generators
(e.g the recent addition of `FileResolver` required slightly different
patches for `HtmlGenerator` and `DocBookGenerator`), the diverging
code-paths make it unfeasible at this point to ensure that there is a
synchronization between the formats or that the same edge-cases are
handled in all of them.

Furthermore, while QDoc officially supports at least HTML and DocBook,
we only use HTML-outputs in Qt, such that not all changes are applied
equally to other formats, as they would not be required to complete a
task.

The extensibility story is similarly grim.
Again, functionality must be replicated separately linearly in the
number of formats.
On the other hand, adding a new format would be similarly problematic
as the structural logic has to be replicated together with the part
that would actually change, the format-specific concerns for the final
output.

Additionally to those concerns, the monolithic view of the "generation
phase", as included in the generators, has proven it difficult to
produce sensible testing for smaller parts of the codebase, allowing only
for a regression testing suite that partially covers specific cases of
a whole QDoc execution
.
This is further deteriorated by the general approach that generators
use, being complex entities with dubiously scoped mutable state
depending on quite complex QDoc-generated structures whose format is
not always clear.

To address some of the concerns with the current structure, with the
objective of progressively simplifying the maintenance of the
generation phase of QDoc, a way to refactor this later stage of a QDoc execution
should be explored, keeping in mind that a full-on rewrite is
inadvisable considering the current complexity of the code and the
fact that we don't have an automated way to ascertain full
compatibility between changes.
https://pandoc.org/lua-filters.html#lua-type-reference
The following approach is proposed.

An intermediate representation is introduced for the layout of the
information intended for the final output of QDoc.
- A study-case for a similar representation can be found in
  Pandoc's types abstraction
  which has shown itself as a robust abstraction of documents in
  a multi-format context.
For each relevant QDoc format, a converter is introduced that
produces a format-specific representation of the new intermediate
representation.
QDoc's current layout structures, which are inline and implicit, are
converted into the new intermediate representation.
Each generator delegates all format-specific concerns to the newly
introduced converters, concerning itself only with destructuring the
required data to produce the intermediate representation and with
leading the order of the generation phase.

In practice this is intended to be introduced with the following code structure:

A new directory, "output" is introduced at the root of QDoc's source code.
The directory is intended to contain all subsystems related to
producing the final documentation.
- A subdirectory "primitives" is introduced under "output".
  The directory is intended to contain all representation of `primitive` objects.
  - `primitive` objects are slim POD types that represent the
    AST of the newly introduced intermediate representation.
- A subdirectory "formats" is introduced under "output".
  The directory is intended to contain all `formatter` objects.
  - `formatter` objects are initially-stateless slim objects
    respecting an interface that are concerned with
    converting the newly introduced intermediate
    representation to a format-specific representation.
- A subdirectory "components" is introduced under "ouput".
  The directory contains all `components`.
  - `components` are slim function-objects, naked functions
    or similar constructs that converts QDoc's
    data-structures to the new intermediate representation,
    bridging the gap between the generators and the
    `formatters`.
A `formatter` object is passed to each generator as a dependency at
construction time.

Then, a series of iterative changes are applied using the following procedure:

A unit of duplicated code between the generators subclasses that
produces a block of output is identified.
The currently available `primitive` objects are examined to
understand if they can faithfully represent the block of output
produced by the unit of code.
- If this is not the case, a series of new `primitive` objects are
  introduced that are able to faithfully represent the block of
  output produced by the unit of code.
  - If any `primitive` is introduced, each `formatter` object is
    extended to provide a format-native representation of the
    `primitive`.
One or more `components` are introduced, destructuring any
required information into a `primitive` objects representation that
faithfully represents the block of output produced by the unit of
code.
A method is introduced in the base class for `Generators`.
The method performs three steps:
- It handles any prerequisites for the consistency of the production of the output.
  For example, copying image files to the output directory.
- It produces a `primitive` representation through the use of a `component`.
- It outputs the result of passing the produced `primitive`
  representation to the `formatter` dependency to any relevant
  output file.
The original unit of duplicated code is replaced in each generator
with a back-call to the newly introduced method in the base class
for generators.

The iterative application of this process is intended to slowly move
bigger units of code under composed `components`, slowly replacing
most of the code in the generator subclasses.

Due to the original structure of the generators, any block of code
that produces a complete output can be replaced without any semantic
change with the above methodology.

This ensures that the refactoring can be introduced gradually,
destructing the output from the inside-out at different granularities
while slowly centralizing the structural representation of a
generation phase into the base class.

This same centralization normalizes the difference between the
generators, removing one of the original pain-points.

The responsibility split between `components` and `formatters` ensures
that we do not incur into the same duplication end extensibility
constraint of the original code.

Furthermore, the more granular approach to components ensures that the
generators processing is divided into smaller units, which allows an
easier understanding of the whole process.

To simplify the avoidance of breakages, `components` can generally
replicate the available code in the generators with regards to
information processing.

At the same time, the decentralization of the processing logic allows
the expression of the required data at the api-boundaries, reducing
the amount of information that the developer needs to keep in mind in
examing the procedure. This limits the damages of the amount of
global-mutable-state that QDoc uses, albeit only partially.

The division of concerns and the more granular units are intended to
further provide an introduction for generation-phase tests.
`formatters` are intended to be fully testable, formalizing the output
that QDoc produces.
That is, they provide a followable code-path to questions of the form "How
does my codeblock look like in the HTML output?".

`components` are expected to be partially testable as long as a
huge initial investment in test-level generators for QDoc's
data structures is exercised.

The generators are expected to still remain untestable as a unit in
this phase.

Further concerns during the iterative refactoring should be the following:

Revise the consistency between similar output-layouts.
Similar information in the current output might be represented quite differently.
For example, the inherited members for qml and cpp elements.
Bubble up possibly erroring side-effecting operation as much as possible.
For example, the copy of images to the output directory cannot be
handled effectively as it is performed under a series of
chain-calls to inner methods and in the middle of the production
of the image-referring output, resulting in some inconsistencies
between the HTML representation of a missing image if the
operation silently fails.
Revise the copious branching that is present in code that is
shared between layouts.
For example, the various branches in `generateRequisites`.
Much of the branching could be moved outside the logic through
case analysis, such as to simplify the actual processing, in
exchange for some possible code-duplication.

The expected result is one that has centralized most of the code into
the single base class for generators, with subclasses mostly
containing back-calls to this subclass, allowing the removal of the
subclasses themselves.

Concerns for this change are the introduction of a series of overhead
computation when destructing or converting data structures.

Furthermore, an expected increase in memory footprint.

Albeit this will need to be revised, it is expected that the gain in
code structures will outweigh the negatives.

The changes are additionally intended as propedeutic to some of the currently considered
long-terms goal for QDoc:

Being able to sensibly parallelize/make concurrent the production of
output files during the generation phase.
This is intended to be explored as a consequence of the refactoring,
if it successful, to regain some of the performance lost to the
introduced overhead.
Being able to describe some of QDoc's operation through a
well-defined interface that can later be exposed to allow for
customization.
Being able to have less monolithic phases that each produce
serializable intermediate representation that can be used to
introduce incremental compilation.

The exploration of this task has been approved by QDoc's maintainer treinio.

Attachments

Issue Links

depends on

QTBUG-105384 Make a generator for the Node base class

In Progress

mentioned in: Page Loading...

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Consider refactoring QDoc's Generator classes

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews