-
Bug
-
Resolution: Done
-
P2: Important
-
6.2.1
-
None
-
7450eda927436a59f34f1a1455a6d6a9515d8156 (qt/qt5compat/dev)
I'm testing a simple XML document with an encoding spec not being UTF-8:
<?xml version="1.0" encoding="iso-8859-1"> <child><t>Hällo, world</t></child>
However, the encoding is not recognized. Instead, I receive garbage for the Umlaut character.
After debugging the issue, I think the problem is here: qt5compat/src/core5/sax/qxml.cpp, line 1348++ in method QXmlInputSource::fromRawData:
...
bool needMoreText;
QByteArray encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText).toLatin1();
if (!encoding.isEmpty()) {
auto e = QStringDecoder::encodingForData(encoding);
if (e && *e != QStringDecoder::Utf8) {
d->toUnicode = QStringDecoder(*e);
...
"extractEncodingDecl" properly reads the encoding as "iso-8859-1", but using "QStringDecoder::encodingForData" seems not to generate the corresponding decoder, but tries to guess the encoding from the string content.
Previous versions of Qt5 used "QTextCodec::codecForName", which renders the desired result:
...
bool needMoreText;
QString encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText);
if (!encoding.isEmpty()) {
if (QTextCodec *codec = QTextCodec::codecForName(std::move(encoding).toLatin1())) {
/* If the encoding is the same, we don't have to do toUnicode() all over again. */
if(codec->mibEnum() != mib) {
delete d->encMapper;
...
| For Gerrit Dashboard: QTBUG-98656 | ||||||
|---|---|---|---|---|---|---|
| # | Subject | Branch | Project | Status | CR | V |
| 383921,2 | SAX: import the Qt 5.15 souce with QTextCodec | dev | qt/qt5compat | Status: MERGED | +2 | 0 |