Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: P2: Important
Fix Version/s: 6.4.0 RC1, 6.5.0 Beta1
Affects Version/s: 5.15.2, 6.3.1, 6.4
Component/s: Core: I/O
Labels:
None

Commits:
e1a787a76e (qt/qtbase/dev) e1a787a76e (qt/tqtc-qtbase/dev) 0c9fc4bfa7 (qt/qtbase/6.4) 0c9fc4bfa7 (qt/tqtc-qtbase/6.4), 29b2fe40d (dev), 65528387d (6.5), c5221f6be (dev), c5c771291 (dev), 7c4e271fe (dev), 9962441bf (6.7), f0e4f50fd (6.6)

Description

This was first detected in KWrite / KATE, see kwrite-devel mailing list.

Quoting Milian Wolf on 3991116.kMKLVPbKup () milian-workstation:

OK, this is apparently totally unrelated to git and kate. Thiago, do you
happen to have an insight here maybe? Is it known that using QProcess can
really badly influence the runtime behavior of malloc in other threads?

Here's a small example to trigger this behavior already:

https://invent.kde.org/-/snippets/2239

I have nproc == 24. Let's run this without any external processes:

$ perf stat -r 5 ./slow-malloc 
 Performance counter stats for './slow-malloc' (5 runs):

          6,868.17 msec task-clock                #   12.781 CPUs utilized            ( +-  0.82% )
            35,262      context-switches          #    5.078 K/sec                    ( +-  0.73% )
             1,518      cpu-migrations            #  218.590 /sec                     ( +- 10.47% )
           477,765      page-faults               #   68.797 K/sec                    ( +-  0.23% )
    27,414,859,033      cycles                    #    3.948 GHz                      ( +-  0.88% )  (84.46%)
     9,269,828,127      stalled-cycles-frontend   #   33.46% frontend cycles idle     ( +-  0.80% )  (84.58%)
     2,503,409,257      stalled-cycles-backend    #    9.04% backend cycles idle      ( +-  1.38% )  (82.85%)
    12,211,168,505      instructions              #    0.44  insn per cycle         
                                                  #    0.77  stalled cycles per insn  ( +-  0.26% )  (82.54%)
     2,699,403,475      branches                  #  388.710 M/sec                    ( +-  0.34% )  (82.99%)
         7,276,801      branch-misses             #    0.27% of all branches          ( +-  0.68% )  (84.56%)

           0.53735 +- 0.00317 seconds time elapsed  ( +-  0.59% )

So far so good. Now let's also run `ls /tmp` , which by itself is plenty fast:

$ time ls /tmp

real    0m0.006s
user    0m0.000s
sys     0m0.006s

Doing that a hundred times per thread as in the example file above should only
take ~600ms. But instead this is what I observe:

$ perf stat -r 5 ./slow-malloc --with-subprocess

 Performance counter stats for './slow-malloc --with-subprocess' (5 runs):

         26,197.00 msec task-clock                #    4.373 CPUs utilized            ( +-  0.29% )
           148,400      context-switches          #    5.669 K/sec                    ( +-  2.19% )
            11,287      cpu-migrations            #  431.174 /sec                     ( +-  2.25% )
         1,559,820      page-faults               #   59.587 K/sec                    ( +-  0.22% )
    99,501,234,050      cycles                    #    3.801 GHz                      ( +-  0.15% )  (85.67%)
    30,922,803,968      stalled-cycles-frontend   #   31.18% frontend cycles idle     ( +-  0.17% )  (85.00%)
    21,809,486,987      stalled-cycles-backend    #   21.99% backend cycles idle      ( +-  0.74% )  (84.85%)
    62,524,522,174      instructions              #    0.63  insn per cycle         
                                                  #    0.49  stalled cycles per insn  ( +-  0.17% )  (84.84%)
    14,128,484,480      branches                  #  539.721 M/sec                    ( +-  0.27% )  (85.23%)
       114,841,497      branch-misses             #    0.82% of all branches          ( +-  0.26% )  (85.86%)

            5.9904 +- 0.0258 seconds time elapsed  ( +-  0.43% )

And perf off-CPU profiling with hotspot again shows the excessive wait time in
rwsem_down_read_slowpath when _int_malloc hits the asm_exc_page_fault code.

Any insight would be welcome, or suggestions on how to better handle this in
user code.