Details
-
Bug
-
Resolution: Done
-
P2: Important
-
5.15.2, 6.3.1, 6.4
-
None
-
e1a787a76e (qt/qtbase/dev) e1a787a76e (qt/tqtc-qtbase/dev) 0c9fc4bfa7 (qt/qtbase/6.4) 0c9fc4bfa7 (qt/tqtc-qtbase/6.4), 29b2fe40d (dev), 65528387d (6.5), c5221f6be (dev), c5c771291 (dev), 7c4e271fe (dev), 9962441bf (6.7), f0e4f50fd (6.6)
Description
This was first detected in KWrite / KATE, see kwrite-devel mailing list.
Quoting Milian Wolf on 3991116.kMKLVPbKup () milian-workstation:
OK, this is apparently totally unrelated to git and kate. Thiago, do you
happen to have an insight here maybe? Is it known that using QProcess can
really badly influence the runtime behavior of malloc in other threads?Here's a small example to trigger this behavior already:
https://invent.kde.org/-/snippets/2239
I have nproc == 24. Let's run this without any external processes:
$ perf stat -r 5 ./slow-malloc Performance counter stats for './slow-malloc' (5 runs): 6,868.17 msec task-clock # 12.781 CPUs utilized ( +- 0.82% ) 35,262 context-switches # 5.078 K/sec ( +- 0.73% ) 1,518 cpu-migrations # 218.590 /sec ( +- 10.47% ) 477,765 page-faults # 68.797 K/sec ( +- 0.23% ) 27,414,859,033 cycles # 3.948 GHz ( +- 0.88% ) (84.46%) 9,269,828,127 stalled-cycles-frontend # 33.46% frontend cycles idle ( +- 0.80% ) (84.58%) 2,503,409,257 stalled-cycles-backend # 9.04% backend cycles idle ( +- 1.38% ) (82.85%) 12,211,168,505 instructions # 0.44 insn per cycle # 0.77 stalled cycles per insn ( +- 0.26% ) (82.54%) 2,699,403,475 branches # 388.710 M/sec ( +- 0.34% ) (82.99%) 7,276,801 branch-misses # 0.27% of all branches ( +- 0.68% ) (84.56%) 0.53735 +- 0.00317 seconds time elapsed ( +- 0.59% )So far so good. Now let's also run `ls /tmp` , which by itself is plenty fast:
$ time ls /tmp real 0m0.006s user 0m0.000s sys 0m0.006sDoing that a hundred times per thread as in the example file above should only
take ~600ms. But instead this is what I observe:$ perf stat -r 5 ./slow-malloc --with-subprocess Performance counter stats for './slow-malloc --with-subprocess' (5 runs): 26,197.00 msec task-clock # 4.373 CPUs utilized ( +- 0.29% ) 148,400 context-switches # 5.669 K/sec ( +- 2.19% ) 11,287 cpu-migrations # 431.174 /sec ( +- 2.25% ) 1,559,820 page-faults # 59.587 K/sec ( +- 0.22% ) 99,501,234,050 cycles # 3.801 GHz ( +- 0.15% ) (85.67%) 30,922,803,968 stalled-cycles-frontend # 31.18% frontend cycles idle ( +- 0.17% ) (85.00%) 21,809,486,987 stalled-cycles-backend # 21.99% backend cycles idle ( +- 0.74% ) (84.85%) 62,524,522,174 instructions # 0.63 insn per cycle # 0.49 stalled cycles per insn ( +- 0.17% ) (84.84%) 14,128,484,480 branches # 539.721 M/sec ( +- 0.27% ) (85.23%) 114,841,497 branch-misses # 0.82% of all branches ( +- 0.26% ) (85.86%) 5.9904 +- 0.0258 seconds time elapsed ( +- 0.43% )And perf off-CPU profiling with hotspot again shows the excessive wait time in
rwsem_down_read_slowpath when _int_malloc hits the asm_exc_page_fault code.Any insight would be welcome, or suggestions on how to better handle this in
user code.
Attachments
Issue Links
- resulted in
-
QTBUG-111964 QProcess::start does not return if setChildModifier hangs
-
- Closed
-
-
QTBUG-117954 QProcess::startDetached quits application when running with ASAN
-
- Closed
-
-
QTBUG-117533 QProcess does not work with thread sanitizer
-
- Closed
-