Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-98656

QXmlInputSource (Qt5 Core compatibility) does not properly read encoding instruction

    XMLWordPrintable

Details

    • 7450eda927436a59f34f1a1455a6d6a9515d8156 (qt/qt5compat/dev)

    Description

      I'm testing a simple XML document with an encoding spec not being UTF-8:

      <?xml version="1.0" encoding="iso-8859-1">
      <child><t>Hällo, world</t></child>
      

      However, the encoding is not recognized. Instead, I receive garbage for the Umlaut character.

      After debugging the issue, I think the problem is here: qt5compat/src/core5/sax/qxml.cpp, line 1348++ in method QXmlInputSource::fromRawData:

      ...
              bool needMoreText;
              QByteArray encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText).toLatin1();
      
              if (!encoding.isEmpty()) {
                  auto e = QStringDecoder::encodingForData(encoding);
                  if (e && *e != QStringDecoder::Utf8) {
                      d->toUnicode = QStringDecoder(*e);
      ...
      

      "extractEncodingDecl" properly reads the encoding as "iso-8859-1", but using "QStringDecoder::encodingForData" seems not to generate the corresponding decoder, but tries to guess the encoding from the string content.

      Previous versions of Qt5 used "QTextCodec::codecForName", which renders the desired result:

      ...
              bool needMoreText;
              QString encoding = extractEncodingDecl(d->encodingDeclChars, &needMoreText);
      
              if (!encoding.isEmpty()) {
                  if (QTextCodec *codec = QTextCodec::codecForName(std::move(encoding).toLatin1())) {
                      /* If the encoding is the same, we don't have to do toUnicode() all over again. */
                      if(codec->mibEnum() != mib) {
                          delete d->encMapper;
      ...
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thiago Thiago Macieira
            matthias67 schlumpf gemüse
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes