Details
-
Bug
-
Resolution: Invalid
-
Not Evaluated
-
None
-
5.11.2
-
None
Description
QString::toUtf8() is not returning the correct UTF-8 code when the QString's input is not UTF-8. On Windows, the default charset is Windows-1252/cp1252, but QString::toUtf8() should still return a correct UTF-8 representation.
The following snippet can be used to reproduce the issue:
#include <QTextStream> int main() { QTextStream out(stdout, QIODevice::WriteOnly); QTextStream in(stdin, QIODevice::ReadOnly); out << "Enter unicode char: " << flush; QString line = in.readLine(); out << "Displays correctly: " << line << ", but not the correct utf-8 code: " << line.toUtf8().toHex() << endl; return 0; }
Unless the QTextStream codec is changed, this will read a character in cp1252 and output it in the same encoding. In between, it should be converted to a UTF-16 QString and output a correct UTF-8 byte sequence.
Example input:
ä
Expected output:
Displays correctly: ä, but not the correct utf-8 code: c3a4
Actual output on Msys2 / mintty:
Displays correctly: ä, but not the correct utf-8 code: c383c2a4
This is probably correct assuming mintty uses UTF-8 as encoding, but Qt expected cp1252. However, compiled natively and run on the PowerShell, I get
Displays correctly: ä, but not the correct utf-8 code: e2809e
Which makes no sense at all to me.
In fact, even when bypassing the QTextStream magic and reading the raw bytes via QFile, I cannot find a way to properly decode my string:
QFile in; in.open(stdin, QIODevice::ReadOnly); // produces ä and e2809e0a QString line = QString::fromLocal8Bit(in.readLine()); // produces ? and efbfbd0a // QString line = QString::fromUtf8(in.readLine()); // produces ä and c2840a // QString line = QString::fromLatin1(in.readLine()); out << line << " " << line.toUtf8().toHex();