Description
I have some problem to write a special UTF-8 char into an XML document. This special char comes from an UTF-8 encoded xml document. This char is coded on 4 bytes. I basically need to save it in another xml document.
Depending on the method I use, I get either:
- a string that does not represent my special char at all
- an invalid entity
- an empty element
while I would expect to have my special char coded on 4 bytes again.
(see http://qt-project.org/forums/viewthread/16273/#81966 to have a more readable code sample)
QDomDocument xmlDoc;
//create a string containing an utf-8 char encoded on 4 bytes (note, this is a valid char coming from a valid XML file encoded in UTF-8)
QByteArray originalSpecialChar;
originalSpecialChar.append(0xF0);
originalSpecialChar.append(0x9D);
originalSpecialChar.append(0x8C);
originalSpecialChar.append(0x86);//put it in a string (thus converted in UNICODE but it keeps the right character)
QString originalSpecialCharInString = QString::fromUtf8(originalSpecialChar.constData(), 4);//add this string into a new XML doc (encoded in UTF-8)
xmlDoc.appendChild(xmlDoc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\""));QDomElement rootNode = xmlDoc.createElement("RootNode");
xmlDoc.appendChild(rootNode);QDomText textNode = xmlDoc.createTextNode(originalSpecialCharInString);
rootNode.appendChild(textNode);//at this point, the specialChar is still correct in the QDomDocument (so the conversion from UTF-8 -> Unicode -> UTF-8 actually works !)
if (textNode.nodeValue().toUtf8() != originalSpecialChar)
qDebug() << "invalid (1)"; //this does not show//save the xml doc into a QByteArray (using save)
QByteArray xmlContent;
QTextStream textStream(&xmlContent);
xmlDoc.save(textStream, 0, QDomNode::EncodingFromDocument); //note: same result if I force the textStream codec to UTF-8 and use EncodingFromTextStreamqDebug() << xmlContent; //shows <?xml version="1.0" encoding="UTF-8"?><RootNode>#xdf06;</RootNode>
//the node contains the string "#xdf06". This is really not the character I expect//save with toString()
qDebug() << xmlDoc.toString(0); //shows <?xml version="1.0" encoding="UTF-8"?><RootNode></RootNode>
//Qt is actually able to read this document but, not a C# client because it actually contains an invalid entity. If I use QDomImplementation::DropInvalidChars, the element is empty so, Qt knows it is invalid.qDebug() << xmlDoc.toString(0).toUtf8(); //shows <?xml version="1.0" encoding="UTF-8"?><RootNode>#xdf06;</RootNode>
//it does not help !//what I would expect (this is actually what my original xml file looked like):
//<?xml version="1.0" encoding="UTF-8"?><RootNode>my original char coded on 4 bytes in the utf-8 doc</RootNode>