Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-135033

QXmlStreamReader::addData() can parse Latin1 data incorrectly

XMLWordPrintable

    • 2
    • 4b8659ebf (dev), cfda48772 (6.9), d852646d3 (6.8), 536610798 (tqtc/lts-6.5)
    • Foundation Sprint 128

      The patch https://codereview.qt-project.org/c/qt/qtbase/+/419210 converted QXmlStreamReader constructor and addData() method to take QAnyStringView.

      However, it didn't set the lockEncoding flag when handling Latin1 strings in addData().

      This can lead to an incorrect result when a Latin1-encoded XML document with a proper "encoding" attirbute is passed as a Latin1 string to addData() method:

      • at first it will be converted to UTF-8
      • later the parser will read the "encoding" attribute, and try to convert the data again into the specified encoding.

      A simple test that illustrates the problem:

      const auto in = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>"
                      "<a>M\xE5rten</a>"_L1;
      QXmlStreamReader reader;
      reader.addData(in);
      QVERIFY(reader.readNextStartElement());
      QString text = reader.readElementText();
      QCOMPARE(text, "M\xE5rten"_L1); \\ FAIL! The result is "M\u00C3\u00A5rten"
      

      The QXmlStreamReader(QAnyStringView) constructor is not affected, because it already sets the flag correctly.

        For Gerrit Dashboard: QTBUG-135033
        # Subject Branch Project Status CR V

            ivan.solovev Ivan Solovev
            ivan.solovev Ivan Solovev
            Vladimir Minenko Vladimir Minenko
            Alex Blasche Alex Blasche
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                There are no open Gerrit changes