Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Incomplete
Priority: Not Evaluated
Fix Version/s: None
Affects Version/s: 4.8.1
Component/s: XML: DOM
Labels:
None
Environment:
Linux, Ubuntu 12.04

Description

When trying to read an XML document containing the thin space character (U+2009, written as as the text content of an element, QtDomDocument discards this character.

Although the QDomDocument documentation says that whitespace is stripped from text nodes, the XML specification does not treat such characters as white space, only "space (#x20) characters, carriage returns, line feeds, or tabs" are whitespace - see http://www.w3.org/TR/REC-xml/#NT-S . I've also tried other unicode whitespace characters (U+20xx) - these too are stripped by QDomDocument.

This problem was encountered interpreting a MathML document, where whitespace character text nodes are common. Please see this thread where this bug is explored in more detail (thanks to those in the thread for helping me) - http://stackoverflow.com/questions/10968940/why-is-qt-losing-my-thin-space-unicode-character-when-loading-an-xml-file

The problem can be seen by just doing
QDomDocument doc;
doc.setContent(QString("<mtext> </mtext>"));
and examining doc.toString(), which is just "<mtext/>". I've included a more complete example, which escapes any unicode output on the screen.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List

test.cc
10 Jun '12 21:04
0.8 kB
Jeremy Sanders

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Unassigned

Reporter:: Jeremy Sanders

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10 Jun '12 21:04

Updated:: 21 Sep '18 13:15

Resolved:: 19 Sep '14 15:15

Gerrit Reviews

There are no open Gerrit changes