Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-123386

QmlFormat. Incorrect handling of some comments

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • P3: Somewhat important
    • 6.8
    • 6.7
    • QML: Tooling
    • None
    • f2da7a80a (dev), 599f0714d (dev), 8acda3d8a (dev)

    Description

      Problems

      While working on QTBUG-117849 it was discovered that some of the comments during formatting are not handled properly. See documented failures (major and minor) here https://codereview.qt-project.org/c/qt/qtdeclarative/+/548311 .

      Briefly here are the problems:

      1. Comments around MethodParameters are lost. In both "standalone" ES and when ES is used inside QmlObject
      2. Inproper formatting when Comment is between the function keyword and the functionIdentifier (not a common case though, but there might be more common problems from that set or when it manifests it spoils the code)
      3. Commented Body of the Method or Class results in a loss or duplication and wrong placement in the "standalone" ES (relatively bad, but not really a common usecase maybe?), but in case of QmlObject a \n character is added (not bad, but just weird, haven't seen smth similar).
      4. If a Body of the item is commented, it moved up. (almost non-important, but for the sake of completeness of the observation and I personally haven't seen similar formatter behaviors)

      For standalone ES, Prettier (https://prettier.io/) can be used for references. I would expect ES inside the QMLObject should be handled in the identical way, however it's not currently possible due to the way it's stored[1]

      Solutions?

      I've spent 1-2 weeks attempting to debug the outlined problems in attempts to fix them, but discovering new ones along the way.

      There is a partial fix to the Problem1 (preserving Comments around MethodParameter-s in the standalone ES ) https://codereview.qt-project.org/c/qt/qtdeclarative/+/551909/9 . However, one commit in the chain introduces some refactoring, which is a bit in a style "make change and pray".

       But I'd like to admit that I didn't find "easy", "explainable" fixes and probably at the current state this is the case where accepting bugs / no refactoring is better than partial fixes and refactoring. What I mean by that is that, in my point of view, we prolly should take some time to properly understand and reconsider / refactor the way comments are handled and it's not a matter of just couple minor fixes/crutches.

      High level architecture (Current State)

      First of all, it's worth noting, that EcmaScript is stored differently depending on whether it's a part of the QmlObject (methods, bindings, etc.) or if it's a "stand-alone" ES (.js and .mjs files), hence formatting logic is also different[1].

      Second, on the high level, QmlFormat traverses through the DOM Items using "unified" writeOut API. However, on the lower level QmlFormat has, very roughly speaking, 2 hats, alternating between those

      1. Operating with FileLocationRegion-s with the help of OutWriter::writeRegion
      2. Operating with AST::Node-s employing ScriptFormatter and OutWriter::write

      Lastly, when it comes to the handling of Comments, QmlFormat imposes the following 2 rules:

      1. each Comment must be attached to either a Region (FileLocationRegion) or to a Node (AST::Node)
      2. each Comment is attached as either Pre comment (/a/a ) or Post comment (a/a/)

      If you're curious how the writeOut of comments is happening in slightly deeper details see [2]

      Taking all things above, most essential thing relevant to the comments is the way they are linked / attached and this is where unfortunately we have very weak understanding what's going on.

      Comments attachment

      I'm talking about CommentCollector::collectComments. Even though semih.yavuz  has made a very good advancement in that area (https://codereview.qt-project.org/c/qt/qtdeclarative/+/530939 ) the overall linking / matching algorithm is still quite hard to maintain and the work on the comments is failing under the category of "Make change and pray". Mostly because it lacks documentation, unit tests and modularity.

      However some minor pieces of documentation are still preserved inside the CommentLinker since the initial implementation and also some of the "high" level comments can be found at the top of the qqmldomcomments.cpp

      The general idea behind the attaching is the following:

      1. Prepare Nodes and Regions, to which comments can be attached.[3]
      2. Link each Comment with the corresponding Node or Region <-- here be dragons

      Dragons

      1. CommentLinker lacks encapsulation / modularity. For example CommentLinker takes in a constructor a reference to an integer (lastPostCommentPostEnd) which is used and modified during the linkCommentWithElement, however that integer variable outlives the lifetime of the CommentLinker.[4].
        Another opportunity for encapsulation and modularity is the way CommentLinker stores intermediately processed information.[5]
      2. findSpacesAroundComment is not tested and it's not clear whether there might be a cleaner/more straightforward approach.
      3. The whole algorithm of CommentLinker::linkCommentWithElement is not documented, not covered by unit tests. The high level order is not really clear, but lower level function like checkElementAfterComment, checkElementBeforeComment, checkElementInside are hard to debug, reason about and make changes (if you find it otherwise, pls reach out to me and help me to figure it out). There is no knowledge about the corner cases, but there are many questions whether there could be a leaner solution.

      Refactoring attempts / inspo

      1. Refactor CommentCollector API and introduce more modularity https://codereview.qt-project.org/c/qt/qtdeclarative/+/551774
      2. More "one step at a time" style refactoring of CommentCollector https://codereview.qt-project.org/c/qt/qtdeclarative/+/552710/1
      3. Rough attempt to introduce unit tests and figure out what's going on https://codereview.qt-project.org/c/qt/qtdeclarative/+/552703

      Notes / References

      [1] In the "stand-alone" ES use-case the content of the whole file (.js or .mjs) is managed through one ScriptExpression (see JsFile::JsFile), formatting of which, purely relies on the AST traversal by the ScriptFormatter.

      Methods of the QmlObject-s are managed through MethodInfo. It contains multiple ScriptExpression-s (one for body, one for each MethodParameter, one for returnType) and semanticScope. For more details on how it's constructed, you can take a look at QQmlDomAstCreator::visit(AST::FunctionDeclaration *fDef).

      In this case formatting is happening through quite complicated MethodInfo::writeOut and MethodParameter::writeOut

      Maybe at some point it will be easier to store the whole Method as just one ScriptExpression??

      [2] First let's consider Comments attached to the Regions, assuming they are attached to the correct region, they'll end up in the Fied::comments of a DomItem. Fied::comments is a  QMap<FileLocationRegion, CommentedElement>. DomItem::writeOut follows the Visitor Pattern

      void DomItem::writeOut(OutWriter &ow) const
      {
          writeOutPre(ow);
          visitEl([this, &ow](auto &&el) { el->writeOut(*this, ow); });
          writeOutPost(ow);
      } 

       

      From DomItem::writeOutPre we get to the OutWriter::itemStart(const DomItem &it), there new OutWriterState is added and OutWriterState::pendingComments are initialised by DomItem.field(Field::comments) Then "Pre" comments will be writtenOut as part of the OutWriter::regionStart and "Post" comments will be writtenOut as part of the OutWriter::regionEnd

       

      When Comments are attached to the Nodes, the code flow is somewhat simpler. Comments which are properly attached to the Nodes end up in the m_astComments member of the ScriptExpression. When QmlFormat needs to format a DomItem, which is ScriptExpression, }}in the {{ScriptExpression::writeOut it instantiates ScriptFormatter (AST::JSVisitor), passing it m_astComments.
      When traversing a Node ScriptFormatter checks in the preVisit function if a Node has an attached CommentedElement using {{m_astComments. }}If it's a "Pre" comment it'll be writtenOut right away, if it's "Post" it'll be added as an PostVisit operation.

       

      [3]  This happens with the help of AstRangesVisitor, which stores each "Element"(either Region or Node) in a form of "Ranges", however there is a shortcut. It stores a map <indexOfElementStart - Element> m_starts and <indexOfElementEnd - Element> m_ends).

       

      [4] see the usage of lastPostCommentPostEnd and m_lastPostCommentPostEnd

       

      [5] Even though some of the coupling most likely imposed by the complexity of how ASTRangesVisitor deals with ranges of "Elements", but it's still might be beneficial to separate handling / storing Comment-specific Info (like CommentLocation, m_spaces) and m_ranges. Currently everything is initialised altogether in the CommentLinker constructor. For me personally it's also quite hard to have Iterators initialized at the constructor stage...

      Attachments

        Issue Links

          For Gerrit Dashboard: QTBUG-123386
          # Subject Branch Project Status CR V

          Activity

            People

              qtqmlteam Qt Qml Team User
              dima.a Dmitrii Akshintsev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: