Details
-
Suggestion
-
Resolution: Out of scope
-
Not Evaluated
-
None
-
4.7.2, 5.3.2
-
None
Description
QString and QChar don't offer full Unicode support. Despite the fact that the QString class can already correctly handle surrogate pairs in UTF-16, it doesn't provide a consistent interface. The problem is that all QString indices are actually 16-bit indices, and not character indices. This makes it impossible to use QString with Unicode characters with code points beyond U+FFFF - all historic scripts as well as many mathematical and musical symbols are therefore left out.
This could be solved quite easily. QString just would have to map the character indices passed through its interface into 16-bit indices and vice versa. Since this only needs to be done for strings which actually contain code points > U+FFFF, there won't be any performance impact on all other strings. Just keep a boolean flag whether the string contains any surrogate pairs or not, and update this flag within all non-const member functions of QString. This flag update is also fast - it only takes either constant or linear time, depending on the member function. For example, QString::append(const QString &str) would just take constant time by writing surrogateFlag |= str.surrogateFlag;
It further would be necessary to extend QChar to 32 bits.