Details
-
Bug
-
Resolution: Unresolved
-
P2: Important
-
None
-
5.15.18
-
None
-
5
Description
The XML spec says¹ that
Characters referred to using character references MUST match the production for Char.
And the Char production² reads
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
eddy found, though, that qdom.cpp:encodeText() encodes surrogate pairs as two character references, each expanding to a surrogate pair code point; exactly what's expressly forbidden by the spec.
¹ https://www.w3.org/TR/REC-xml/#wf-Legalchar
² https://www.w3.org/TR/REC-xml/#NT-Char