Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-135291

QDomText mis-encodes characters that require surrogate pairs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • P2: Important
    • None
    • 5.15.18
    • XML: DOM
    • None
    • 5

    Description

      The XML spec says¹ that

      Characters referred to using character references MUST match the production for Char.

      And the Char production² reads

      Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]	/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
      

      eddy found, though, that qdom.cpp:encodeText() encodes surrogate pairs as two character references, each expanding to a surrogate pair code point; exactly what's expressly forbidden by the spec.

      ¹ https://www.w3.org/TR/REC-xml/#wf-Legalchar
      ² https://www.w3.org/TR/REC-xml/#NT-Char

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            magdalenas Magdalena Stojek
            mmutz Marc Mutz
            Vladimir Minenko Vladimir Minenko
            Alex Blasche Alex Blasche
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes