Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-60615

QXmlStreamEntityResolver applies XML character restrictions to resolved entities

    XMLWordPrintable

Details

    • Bug
    • Resolution: Out of scope
    • Not Evaluated
    • None
    • 5.8.0
    • None
    • Debian Linux, using On-line installer Qt 5.8.

    Description

      I have a requirement to store ASCII control codes (as part of a system that may include ANSI ESC codes so which could include ASCII character 0x1b) but, as I am now well aware, the QXmlSimpleReader/QXmlSimpleWriter QXmlStreamReader/QXmlStreamWriter only supports XML 1.0 and NOT 1.1 so those codes are prohibited from such a data stream.

      I then thought to replace them with XML entities such as "&ESC;" and (whilst it was fiddly) I managed to arrange for my data to be split up into runs of "normal" data so that I could then use, e.g. QXmlStreamWriter::writeEntityReference(QLatin1String("ESC") to insert a "replacement" entity, however in the reader I ran into a problem; I tried the following which seems the way to do the reverse:

      Header:

      #include <QXmlStreamEntityResolver>
      
      class TXmlEntityResolver : public QXmlStreamEntityResolver {
          TXmlEntityResolver::TXmlEntityResolver() {}
      
          resolveUndeclaredEntity(const QString &name) override {
              if (name == QLatin1String("ESC")) {
                  return QStringLiteral("\0x1b");
              }
      
              return QStringLiteral("*Unknown-Xml-Entity-%1*").arg(name);
          }
      }
      

      Then inside a QXmlReader derived class, using the above as:

          TXmlEntityResolver myResolver;
          setEntityResolver(&myResolver);
      
          // stuff...
      
          // The element I am trying to read here DOES contain an "&ESC;" entity:
          QString script = readElementText();
          if (Error() != NoError) {
              qDebug() << "XMLimport::readScriptElement() ERROR:"
                       << errorString();
          }
      
      

      I found that if I then try and parse an XML file with something like the following:

                  ...
                  <name>test XML error</name>
                  <script>function breakMe()
      	echo(&quot;\n&ESC;[1;38;41mIs this red?&ESC;[0m\n&quot;)
      end</script>
                  <somethingElse/>
                  ....
      

      the entity IS replaced (with the UTF-16 code point for the ASCII ESC character) as I can trace the replacement happening in resolveUndeclaredEntity(...) via the debugger but it immediately aborts the parsing of the rest of the element so the script QString ends up containing:

      function breakMe()
      echo("

      and DOES NOT produce any error. It seems that it aborts reading if the replacement for the entity is NOT something that would be valid as XML 1.0 even though I was trying to use an Entity to avoid putting invalid data in the XML in the first place!

      Am I pushing the envelope too far here? Should the readElementText() be allowed to return data that would not be possible to have in the XML 1.0 data stream directly?

      Edited: to correct error in specifying QXmlSimpleXXXX classes instead of QXmlStreamXXXX!

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thiago Thiago Macieira
            slysven Stephen Lyons
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes