Details
-
Bug
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.5, 6.6, 6.7
-
None
-
-
b099988f6 (dev), d5eb5d2f8 (dev)
Description
Note: This is only for the "System" Encoding for QStringConverter!
After recently having enabled storing more than 1 character to state, the new status quo is that we try to restore what we have, and if it is still not valid we throw away what was stored and try to decode the rest of the string without it.
What would be optimal would be to only throw away invalid characters until we reach the boundary of a new character, but this is not always trivially achievable. For instance, the second and third octet in a sequence might, by themselves, decode to a valid character, so there is no way to guess if this was the intent or not.
Very concretely, given
result = QLocal8Bit::convertToUnicode_sys("\xe4\xe4\xbd", UTF8, &state); result += QLocal8Bit::convertToUnicode_sys("\xa0", UTF8, &state);
The logical correct output might be a replacement character (for the incomplete first \xe4) + 你. But currently the whole sequence would be discarded and you get 4 replacement characters.
What's more important is that in general the QStringConverter machinery does not enable us to say that the internal buffers needs to be drained because it will not be called anymore.
Attachments
Gerrit Reviews
For Gerrit Dashboard: QTBUG-118834 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
538670,5 | QStringConverter: add a test for missing drain | dev | qt/qtbase | Status: MERGED | +2 | 0 |
538728,10 | QLocal8Bit::convertToUnicode[win]: rewrite remainingChars handling as recursive | dev | qt/qtbase | Status: MERGED | +2 | 0 |