Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-71410

QString::toUtf8() not returning correct UTF-8 representation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Invalid
    • Not Evaluated
    • None
    • 5.11.2
    • None
    • Windows

    Description

      QString::toUtf8() is not returning the correct UTF-8 code when the QString's input is not UTF-8. On Windows, the default charset is Windows-1252/cp1252, but QString::toUtf8() should still return a correct UTF-8 representation.

      The following snippet can be used to reproduce the issue:

       

      #include <QTextStream>
      
      int main()
      {
          QTextStream out(stdout, QIODevice::WriteOnly);
          QTextStream in(stdin, QIODevice::ReadOnly);
          out << "Enter unicode char: " << flush;
          QString line = in.readLine();
          out << "Displays correctly: " << line << ", but not the correct utf-8 code: " << line.toUtf8().toHex() << endl;
      
          return 0;
      }

      Unless the QTextStream codec is changed, this will read a character in cp1252 and output it in the same encoding. In between, it should be converted to a UTF-16 QString and output a correct UTF-8 byte sequence.

       

      Example input:

      ä

      Expected output: 

      Displays correctly: ä, but not the correct utf-8 code: c3a4
      

      Actual output on Msys2 / mintty:

      Displays correctly: ä, but not the correct utf-8 code: c383c2a4
      

      This is probably correct assuming mintty uses UTF-8 as encoding, but Qt expected cp1252. However, compiled natively and run on the PowerShell, I get

      Displays correctly: ä, but not the correct utf-8 code: e2809e
      

      Which makes no sense at all to me.

      In fact, even when bypassing the QTextStream magic and reading the raw bytes via QFile, I cannot find a way to properly decode my string:

      QFile in;
      in.open(stdin, QIODevice::ReadOnly);
      
      // produces ä and e2809e0a
      QString line = QString::fromLocal8Bit(in.readLine());
      
      // produces ? and efbfbd0a
      // QString line = QString::fromUtf8(in.readLine());
      
      // produces ä and c2840a
      // QString line = QString::fromLatin1(in.readLine());
      
      out << line << " " << line.toUtf8().toHex();

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thiago Thiago Macieira
            phoerious Janek Bevendorff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes