Details
-
Bug
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.7.3
-
None
-
Arch Linux up-to-date and Windows 11 up-todate.
Description
The HTML parser QTextEdit uses is broken in the presence of some non-BMP characters in the HTML even when those characters are encoded using HTML entities. A minimal one-line HTML snippet to reproduce the issue is below alongwith simple wrapper PySide code to reproduce.
from PySide6.QtWidgets import QTextEdit, QApplication app = QApplication([]) w = QTextEdit() w.setHtml('<p id="x">🐚🐚</p>') w.show() app.exec()
This displays as two question marks followed by the spiral shell character U+1f41a (🐚) instead of two spiral shell characters. Note that id="x" is needed, it doesn't reproduce without that, which tells me its dependent on some parser internal state. I have used Python as a convenient way to reproduce the issue but it exists in C++ as well, since the string being passed to setHTML() is pure ASCII. I have also attatched a screenshot of the buggy rendering on my system.