Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-116085

QXmlSimpleReader truncates character references of non-BMP characters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • P2: Important
    • 5.15.16
    • 5.15.10
    • XML: DOM
    • None
    • 245437abb (tqtc/lts-5.15)

    Description

      QXmlSimpleReaderPrivate::parseReference in sax/qxml.cpp converts character references by doing

      tmp = ref().toUInt(&ok, 16);
      stringAddC(QChar(tmp));
      

      which truncates any characters in supplementary planes (higher than U+FFFF) to QChar. The correct behaviour should be to add its high and low surrogates.

      Reproducer:

      #include <QDebug>
      #include <QDomDocument>
      #include <QXmlSimpleReader>
      #include <QXmlStreamReader>
      
      int main(int argc, char *argv[])
      {
          QString xml = QStringLiteral(u"<test>&#x1F469;</test>");
          QDomDocument doc;
          QString errorMsg;
      #if 1
          QXmlSimpleReader reader;
          reader.setFeature("http://qt-project.org/xml/features/report-whitespace-only-CharData", true);
          reader.setFeature("http://xml.org/sax/features/namespaces", false);
          reader.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
          QXmlInputSource source;
          source.setData(xml);
          doc.setContent(&source, &reader, &errorMsg);
      #else
          QXmlStreamReader streamReader(xml);
          doc.setContent(&streamReader, false, &errorMsg);
      #endif
          qDebug() << errorMsg;
          qDebug().quote() << doc.documentElement().text();
      }
      

      Actual Output:

      ""
      "\uF469"
      

      Expected Output:

      ""
      "👩"
      

      Probably also affects Qt5Compat though I haven't tested it as the reproducer doesn't build on Qt6.

      Downstream bug:
      https://bugs.kde.org/show_bug.cgi?id=473380

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Eddy Edward Welbourne
            alvinhochun Alvin Wong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 2 hours
                2h

                Gerrit Reviews

                  There are no open Gerrit changes