Details
Description
This appears to only occur if Py_UNICODE_WIDE is not defined.
The PySide2 build that produces the error has the following preprocessor defines:
Py_LIMITED_API is not defined.
QT_VERSION is < QT_VERSION_CHECK(6,0,0)
Py_UNICODE_WIDE is not defined
The code in question that is erroneous is this:
// Python to C++ conversions for type 'QString'. static void PyUnicode_PythonToCpp_QString(PyObject *pyIn, void *cppOut) { // ======================================================================== // START of custom code block [file: ../glue/qtcore.cpp (conversion-pyunicode)] #ifndef Py_LIMITED_API Py_UNICODE *unicode = PyUnicode_AS_UNICODE(pyIn); # if defined(Py_UNICODE_WIDE) // cast as Py_UNICODE can be a different type # if QT_VERSION >= QT_VERSION_CHECK(6, 0, 0) *reinterpret_cast<::QString *>(cppOut) = QString::fromUcs4(reinterpret_cast<const char32_t *>(unicode)); # else *reinterpret_cast<::QString *>(cppOut) = QString::fromUcs4(reinterpret_cast<const uint *>(unicode)); # endif // Qt 6 # else // Py_UNICODE_WIDE # if QT_VERSION >= QT_VERSION_CHECK(6, 0, 0) *reinterpret_cast<::QString *>(cppOut) = QString::fromUtf16(reinterpret_cast<const char16_t *>(unicode), PepUnicode_GetLength(pyIn)); # else *reinterpret_cast<::QString *>(cppOut) = QString::fromUtf16(reinterpret_cast<const ushort *>(unicode), PepUnicode_GetLength(pyIn)); # endif // Qt 6 # endif #else wchar_t *temp = PyUnicode_AsWideCharString(pyIn, NULL); *reinterpret_cast<::QString *>(cppOut) = QString::fromWCharArray(temp); PyMem_Free(temp); #endif // END of custom code block [file: ../glue/qtcore.cpp (conversion-pyunicode)] // ======================================================================== }
With the preprocessor defines as they are defined above, the following is called:
*reinterpret_cast<::QString *>(cppOut) = QString::fromUtf16(reinterpret_cast<const ushort *>(unicode), PepUnicode_GetLength(pyIn));
PLEASE NOTE: Qt's Jira apparently does not allow emoji to be contained within issue descriptions (it should be upgraded - other Jira instances do support this).
Below you will find that I reference right-angled-magnifying-glass-emoji .
I follow that PepUnicode_GetLength(pyIn) returns a length in code points rather than length in ushort characters.
Looking at the PySide2 code defining PepUnicode_GetLength and reading the docs, I think the behavior is likely correct if Python 2 is used because PyUnicode_GetSize is called - which will "Return the size of the deprecated Py_UNICODE representation, in code units (this includes surrogate pairs as 2 units)."†
The behavior is incorrect if Python 3 is used because PyUnicode_GetLength is called, which will "Return the length of the Unicode object, in code points."
It appears that a code unit likely refers to a uchar, and a code point refers to the number of glyphs - which could be represented either with a single byte, or with multiple bytes - like in the case of .
This Wikipedia article provides definitions of what "code unit" and "code point" are. For our purposes, it seems the two could be used interchangeably - and a code unit could contain more bits than one byte (so it would be 's bit representation).
After reading that Wikipedia article's definition of code unit, I am left confused, however it does resolve my prior confusion in that the Python docs show PyUnicode_GET_SIZE and PyUnicode_GET_DATA_SIZE (which I thought dealt with byte lengths) are deprecated and replaced with PyUnicode_GET_LENGTH (which returns length in code points).
So - I don't see any Python API that will return the length of a PyUnicode object in bytes - which is what is needed for creating a QString out of it.
What I'm trying to sort out from all this is - what would be the proper fix in PySide to fix this?
It does seem like this ../glue/qtcore.cpp (conversion-pyunicode) does indeed contain a bug and needs changing - I just don't understand how to get the proper length to pass to QString::fromUtf16.
† This documentation is from Python 3, but applies to Python 2 as well, but with a better description than was present in Python 2's documentation of this function.
Attachments
Issue Links
- is duplicated by
-
PYSIDE-2093 Unicode 32 fonts do not render properly, shows garbage icon
- Closed