Details
-
Bug
-
Resolution: Fixed
-
P2: Important
-
6.5
-
None
-
Apple M1 compiled with mainline clang 15.0.6 (brew installed)
-
-
c333d4108 (dev), e3948d2cf (6.5)
Description
QSGBatchRenderer calculation for Z positioning during batch upload can produce a near-zero negative value on Apple M1 chips, which results in the top most element being clipped.
The issue seems to be specific to arm64 and is caused by changes made in clang 14 to compiler optimizations that replace the `fmul` -> `fsub` instruction sequences with the `fmsub` fused multiply-add (FMA) instruction. `fmsub` will regularly produce near-zero negative values (e.g., -5.48865e-17) in calculations like those for zDepth in the batch renderer:
float zorder = 1.0f - e->order * m_zRange;
where
m_zRange = 1.0 / max(e->order);
This means the top-most element should always have a 'zorder' of 1.0 - N * 1.0 / N -> 0.0.
However, `fmsub` will produce a near-zero negative number in about half the values of N from 0 to 2^32 starting at 5.
This code sample shows the change in assembly from clang 13 to 14: https://godbolt.org/z/TnKc834Kn
See assembly lines 35-36 on clang 13 and line 35 on clang 14.
Apple's build of clang 14 does not use this optimization and still generates the `fmul` -> `fsub` sequence.
See text-clip-reproduction.qml and the attached images for an example of how the `fmsub` optimization bug manifests in practice.
In this case 5 layers is enough to show the error. 4 nested Rectangles and a TextEdit field.
The TextEdit field's 'order' is 5, so the calculated zDepth should be 0.0 (1.0 - 5 * 0.2).
With the `fmsub` instruction, this calculation results in a value of `-5.55112e-17` and the text gets clipped as a result.