Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-81348

Improve performance of moc on Windows, perhaps by allowing multiple input files

    XMLWordPrintable

Details

    • Suggestion
    • Resolution: Unresolved
    • P3: Somewhat important
    • None
    • 5.12.5, 5.14.0
    • Build tools: moc
    • None
    • Windows 10 build 1903 and 1910

      msvc 2017

    • Windows

    Description

      tl;dr:

      Satisfying #include files is a substantial bottleneck for moc on Windows. This bottleneck is particularly problematic when some or all of the following conditions is true: multiple moc processes are running at once; certain UWP (Universal Windows Platform) applications are running; mocable files #include more files than strictly necessary. Changes to how a project is moced can reduce the build time of an application substantially.

      The measurements made below were done using a 64-bit build of Qt 5.12.5 but I have tested 5.14.0 and get similar results.

      Problem:

      mocing a large, complicated project with lots of mocable files can be very slow on Windows. This is primarily because each mocable file is moced by a separate instance of moc. The way that moc satisfies #include statements is that it generates a proposed full path by iterating through all paths in INCLUDEPATH and appending a leaf name to the base path. moc then tests whether the proposed file exists. This can result in hundreds or thousands of attempts to access an invalid file path during the duration of a single moc process. moc does cache the files that it finds, but this cache is not shared between moc processes.

      Some numbers from an actual project with about 550 mocable files and 50 paths in INCLUDEPATH (plus about 10 more added by Qt's regular moc implementation):

      number of attempts to open a file or directory that is invalid: 20.6 milion
      number of attempts to open a file or directory that are successful: 2.3 million
      Total time attributed to failures: 355 seconds (cpu time)
      Total time attributed to successes: 32 seconds (cpu time)
      Total wall clock time attributed to all moc.exe calls: 67 seconds

       

      The above numbers were for a debug build using the msvc 2017 compiler on Windows 10 build 1903. The machine has a 16 core/32 thread AMD ThreadRipper processor, 32 GB RAM, and NVMe storage.

      Profiling data collected during the build above shows that moc.exe spends >95% of the time satisfying #include statements, and almost all of this time is spent in the Windows API call GetFileAttributesEx, so the moc code itself is not directly to blame (see note 1).

      Additional complication:

      Starting with Windows 10 build 1903, certain UWP applications can cause additional minifilter drivers to be started. Minifilter drivers insert themselves into file I/O operations and we have observed that certain minifilter drivers, when active, can substantially increase the execution time of moc.

      A list of active minifilter drivers can be obtained by opening a command prompt using Run As Administrator and executing the fltmc command. Entries with a non-zero value in the "Num Instances" column are active. Our initial investigations suggested that the CldFlt filter driver caused moc delays while mocing files, but disabling that driver by executing 

      sc config CldFlt start=disabled
      

      prevented CldFlt from loading. More recently, we have seen substantial delays when the bindflt filter driver is running. This driver, even if disabled using 

      sc config bindflt start=disabled
      

      can be started by the OS when either the MicrosoftOfficeHub or XboxGameOverlay UWP application is started (there may be others). Based on event logs (see note 2), the former can be started without any user intervention. With a default installation of Windows 10 Home edition, the game bar is started with the Windows key + G shortcut, which may be mistakenly triggered when Ctrl+G (find again) is intended. It seems likely that XboxGameOverlay can be started without any user intervention, because the bindflt driver sometimes starts on our automated build machine when no person has used the machine for days. There may also be other UWP applications that start the bindflt filter driver.

      On the same 16 core machine mentioned above, the full debug build time of our application is substantially longer when the bindflt driver is loaded:

      bindflt OFF 2:55
      bindflt ON 7:05

      Sampling data collected indicates that the extra time is almost entirely spent in moc.exe, and as before the call to GetFileAttributesEx consumed most of that time.

       
      I also tested the build times of our application on an older laptop with a 2 core Intel Core i7-6560U processor (2.2 GHz), 16 GB, SSD hard drive. On this machine the impact of bindflt was noticeable, though not nearly as large as on the more powerful machine. I hypothesize that this is due to fewer moc.exe processes running in parallel.

      bindflt OFF: 14:06
      bindflt ON: 15:14

      Bindflt running also has a large impact on the compile times of Qt Creator. Here are the debug build times for a Git checkout of Qt creator, using Qt 5.12.5 with msvc2017. This is on the same 16-core machine mentioned above:

      bindflt OFF: 4:45
      bindflt ON: 7:30

       

      Possible Solutions:

      The simple approaches to reducing moc time include reducing the number of paths in INCLUDEPATH, removing #include statements from mocable files when possible, and reducing the total number of files that need to be moced. Some of these steps could likely be taken for our application to improve its build time, but for the sake of discussion let's assume that Qt Creator has already taken these measures yet still can cause moc.exe to take a very long time, at least in the case when bindflt is on.

      In 2016, a [change|https://codereview.qt-project.org/c/qt/qtbase/+/160755] to moc was proposed to allow multiple input files to be moced by one instance of moc. This change was later abandoned, in part because it would have required qmake changes to make it work correctly. Ultimately, I think a change to moc that allows it to take a list of files to be moced as an input is likely to be the best one, since it would allow moc to use the same map of #include name to actual file.

      For our application, I have come up with a workable but imperfect solution that concatenates all mocable headers together at build time and then runs moc on the generated header files. In testing builds of our application on the same 16-core Windows machine mentioned above, here are the build times I get using the attached moccombinepp feature versus the default moc feature:

        bindflt OFF bindflt ON
      Default moc: 2:55 7:05
      moccombinepp: 2:09 2:20

      So this approach substantially decreases the build time with and without bindflt active, but especially with bindflt ON.

      This conclusion is also true for the 2 core Windows 10 laptop mentioned above:

        bindflt OFF bindflt ON
      Default moc: 14:06 15:14
      moccombinepp: 7:24 7:44

      Our project is organized as several subdir projects. We have about 5 small projects (1-2 mocable files each) and 1 giant project (~550 mocable files). In the .pro file of the giant project only, I have added this:

      QT.core.CONFIG -= moc
      CONFIG += moccombinepp

      Adding the same to the .pro files of the small projects results in compile errors, but I can use the attached moccombine feature in the small projects and successfully build.

      I wrote the code in the two attached feature files myself (moccombine.prf and moccombinepp.prf) but anyone may freely use the code in whatever way they see fit. I don't think either of the current features is robust enough to be added to Qt directly, but perhaps someone with more familiarity with qmake and moc could modify them in a way that makes them more broadly useful.

      For the sake of completeness, I tested debug builds of the same application on two Macintosh machines (both iMacs, one with 4 cores and one with 6 cores). With the moccombinepp feature enabled, the build was about 20-25 seconds faster (out of a total of between 4:30 to 5:40 depending on which machine I was using). So improving how moc is run might decrease build times on Macintosh slightly, but not nearly as much as on Windows.

      Conclusion:

      Changing the way that moc commands are generated can substantially decrease the build time of a complicated project on Windows 10. While not tested here, changing moc to allow multiple input files to be specified would likely have a similar effect and may be more robust. As more machines receive the update to Windows 10 1903 (most of our machines did not start getting this update until August of 2019) the problem of bindflt causing major increases in moc time may become more common.

      Note 1:

      Since moc and the compiler must both satisfy #include declarations, I wondered why the msvc compiler (cl.exe) was able to do so faster and without being impacted by bindflt. I collected sampling data from cl.exe during a build and it appears that the cl.exe::CheckFileExists function makes many calls to the Windows API function [ZWQueryDirectoryFile|https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-zwquerydirectoryfile]. My guess is that the file system is able to quickly check for the existence of a file when given a handle to the directory and a leaf file name but is much slower to return when GetFileAttributesEx is called, since the later presumably requires the file system to walk down the path from the root to the leaf.

      Note 2:

      To determine which UWP applications caused bindflt to start, I opened the Windows Event Viewer application using Run as Administrator. I then chose the Open Saved Log... action from the upper right side to load Microsoft-Windows-AppModel-Runtime%4Admin.evtx and Microsoft-Windows-Containers-BindFlt%4Operational.evtx, both found within C:\Windows\System32\winevt\Logs. I then looked for events with Event ID of 2 in the Containers-BindFlt file and found events with the same time and date in the AppModel-Runtime file.

      Attachments

        1. moccombine.prf
          8 kB
        2. moccombinepp.prf
          10 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            fabiankosmale Fabian Kosmale
            aclight Adam Light
            Votes:
            5 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes