Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-51283

Multi-byte characters damage during SAX-parsing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • P4: Low
    • None
    • 4.5.0
    • XML: DOM
    • None
    • Linux (EEEBuntu 3)

    Description

      When a portion of incoming byte data is over (from QIODevice and originally from QNetworkReply from network) the QXmlInputSource passed special QXmlInputSource::EndOfData token (0xFFFE). It can be intercepted by overriding QXmlInputSource::next() method. It's looks like the multi-bytes character is damaged if it splits to the two portions.
      In my case UTF-8 XML comes from network. The damaged character replaces to EF BF BD bytes ("replacement character") when it appears in handler-methods.
      It's not so easy to provoke this error. In my case XML with cyrillic letters went through block encryption/decryption filters so they are coming by portions of the block size. But it is allowed that data comes from network split by bytes, not by characters!
      I discovered this on an ancient Qt, but I suspect this error exist in any version. To reproduce it you are to parse a stream from network. At the server side (in my case PHP under Apache) send multi-byte characters split by flash() function with some pause.

      I wander if there are some workaround. Catching QXmlInputSource::EndOfData at QXmlInputSource::next() doesn't help.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            shestero Michael Shestero
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes