Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-129836

QTextEdit HTML parser broken in presence of non-BMP unicode characters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • P3: Somewhat important
    • None
    • 6.7.3
    • None
    • Arch Linux up-to-date and Windows 11 up-todate.
    • All

    Description

      The HTML parser QTextEdit uses is broken in the presence of some non-BMP characters in the HTML even when those characters are encoded using HTML entities. A minimal one-line HTML snippet to reproduce the issue is below alongwith simple wrapper PySide code to reproduce.

      from PySide6.QtWidgets import QTextEdit, QApplication
      app = QApplication([])
      w = QTextEdit()
      w.setHtml('<p id="x">&#x1f41a;&#x1f41a;</p>')
      w.show()
      app.exec() 

      This displays as two question marks followed by the spiral shell character U+1f41a (🐚) instead of two spiral shell characters. Note that id="x" is needed, it doesn't reproduce without that, which tells me its dependent on some parser internal state. I have used Python as a convenient way to reproduce the issue but it exists in C++ as well, since the string being passed to setHTML() is pure ASCII. I have also attatched a screenshot of the buggy rendering on my system.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            qt.team.quick.subscriptions Qt Quick and Widgets Team
            kovidgoyal Kovid Goyal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes