Details
-
Bug
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.7
-
None
Description
for (char32_t ch = 0x10000; ch-- > 0;) { const char32_t up = QChar::toUpper(ch), low = QChar::toLower(ch); if (QChar::toLower(up) != ch && QChar::toUpper(low) != ch) qDebug("%04X not in {lower(%04X) = %04X, upper(%04X) = %04X", uint(ch), uint(up), uint(QChar::toLower(up)), uint(low), uint(QChar::toUpper(low))); }
reports 59 matches, most of which genuinely don't round-trip via toUpper()/toLower() in the sense it tests. However, its matches with low in the range U+1F80 through U+1FF3 report toUpper(low) == low where FileFormat.info reports they do round-trip, e.g. https://www.fileformat.info/info/unicode/char/1f80 claims to upper-case to U+1F88, not to itself (U+1F80).
(Note: the same code also reveals that U+1E9E LATIN CAPITAL LETTER SHARP S could serve as a single-QChar upper-case for U+00DF (lower sharp s), whose proper upper-case is "SS" (a two-character string, that toUpper()'s QChar return can't represent); at present QChar upper-cases U+00DF to itself, although QString correctly upper-cases it as "SS"; we could have QChar technically-incorrectly upper-case it to U+1E9E, which would at least be an upper-case character that does lower-case back correctly, making it arguably better than U+00DF claiming to be its own upper-case, despite admitting to being lower-case.)
Compare
for (char32_t ch = 0x10000; ch-- > 0;) { if (QChar::isLower(ch) && QChar::toUpper(ch) == ch) qDebug("U+%04X claims to be lower-case but also upper-cases to itself", ch); }
which turns out to have 329 hits. I have not studied these in further detail, but this also may be worth looking into.