Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-26106

Qt treats thin space (U+2009) as whitespace and strips them

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Not Evaluated
    • None
    • 4.8.1
    • XML: DOM
    • None
    • Linux, Ubuntu 12.04

    Description

      When trying to read an XML document containing the thin space character (U+2009, written as   as the text content of an element, QtDomDocument discards this character.

      Although the QDomDocument documentation says that whitespace is stripped from text nodes, the XML specification does not treat such characters as white space, only "space (#x20) characters, carriage returns, line feeds, or tabs" are whitespace - see http://www.w3.org/TR/REC-xml/#NT-S . I've also tried other unicode whitespace characters (U+20xx) - these too are stripped by QDomDocument.

      This problem was encountered interpreting a MathML document, where whitespace character text nodes are common. Please see this thread where this bug is explored in more detail (thanks to those in the thread for helping me) - http://stackoverflow.com/questions/10968940/why-is-qt-losing-my-thin-space-unicode-character-when-loading-an-xml-file

      The problem can be seen by just doing
      QDomDocument doc;
      doc.setContent(QString("<mtext> </mtext>"));
      and examining doc.toString(), which is just "<mtext/>". I've included a more complete example, which escapes any unicode output on the screen.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            jeremysanders Jeremy Sanders
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes