Details
-
Bug
-
Resolution: Unresolved
-
P2: Important
-
None
-
6.5, 6.6, 6.7, 6.8, 6.9.0 RC
-
None
-
8
-
Foundation Sprint 128
Description
The addData(QAnyStringView) overload unconditionally converts Latin1 and UTF-16 strings to UTF-8. This can be fine if it's the first data added to the reader, or if the previous data was also in the UTF-8 encoding (or was converted to it).
However, assume a case when we originally provided some raw data (via a constructor taking QByteArray or addData(QByteArray) overload), and this raw data had a proper XML prolog with an "encoding" attribute set. The specified encoding could be one of Latin1 or UTF-16, in which case the parser would internally use the respective decoder.
After that, appending a string in the proper encoding should just work fine. However, in practice it will result in corrupted data (for Latin1) or even a parsing error (for UTF-16), because we unconditionally convert this data to UTF-8.
This can be illustrated by a simple example:
const QString utf16Str = u"<?xml version=\"1.0\" encoding=\"utf-16\"?>" "<foo><a>Some Data</a>"_s; // as byte array const QByteArray utf16Data{ reinterpret_cast<const char *>(utf16Str.utf16()), utf16Str.size() * 2 }; QXmlStreamReader reader(utf16Data); bool res = reader.readNextStartElement(); // Read <foo>, OK res = reader.readNextStartElement(); // Read <a>, OK QString text = reader.readElementText(); // append more UTF-16 data reader.addData(u"<a>Other Data</a>"_s); res = reader.readNextStartElement(); // Read <a>, FAIL! qDebug() << reader.errorString(); // "Premature end of document."
The problem is that newly-added data is converted to UTF-8 and has twice less bytes than the UTF-16 parser expects.
For the Latin1 + Latin1 case the parsing will work, but non-ASCII characters will be read incorrectly.
Attachments
Issue Links
- relates to
-
QTBUG-135033 QXmlStreamReader::addData() can parse Latin1 data incorrectly
-
- Closed
-
-
QTBUG-124636 QXmlStreamReader: don't convert input
-
- In Progress
-
Gerrit Reviews
For Gerrit Dashboard: QTBUG-135129 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
634101,4 | QXmlStreamReader: check appending data with unexpected encoding | dev | qt/qtbase | Status: NEW | 0 | 0 |
636060,1 | QXmlStreamReader: fix addData() unnecessary conversion to UTF-8 | dev | qt/qtbase | Status: NEW | 0 | 0 |