Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-106543

Develop a method to encode more than UINT_MAX in qCompress() length field

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • P3: Somewhat important
    • None
    • 6.0.4, 6.1.3, 6.2.5, 6.3.1, 6.4.0 Beta4
    • None
    • 8
    • Foundation PM Staging

    Description

      While working on QTBUG-104972, which fixed the most obvious problems with the widening of the functions from 32-bit to 64-bit-sized Qt containers, it turned out that we use an ad-hoc format whereby we prepend the decompressed data's length as a 32-bit unsigned Big Endian to the compressed data.

      Upon uncompressing, we use that field as a hint for the output buffer, but if the buffer turns out to be too small (Z_BUF_ERROR), we double the buffer's size and try again.

      In this way, 64-bit platforms can actually qCompress() more than 4GiB of data (except Windows, cf. QTBUG-106542) and qUncompress() can decompress it again, albeit at the expense of several rounds that end in Z_BUF_ERROR.

      This problem is exacerbated by the current code using simple narrowing of the input size to 32-bit: An input size of UINT_MAX + 1 therefore starts with a 1-byte sized buffer which is resized 32 times until it's 8GiB and can finally hold the output.

      This task is about finding ways to encode the real length in a way that

      1. old code doesn't choke on
      2. allows new code to calculate the right buffer size on the first try

      Failing that, we should at the very least minimize the number of rounds with Z_BUF_ERROR.

      Some ideas:

      • using 0xffff'ffff for anything ≥ 4GiB (saturation arithmetic, minimizes rounds)
      • using a length > INT_MAX < UINT_MAX that, when repeated doubled, produces a buffer minimally larger than the real buffer (minimizes overallocation)
      • using a floating point encoding, provided the value interpreted as a uint is > INT_MAX and ≤ the real value
        • untested example: 0b1EEE'EEES 0xSS 0xSS 0xSS where E is a 6-bit unsigned exponent and S is a 25-bit unsigned significant (could use the MSB as part of the significant)
      • ...

      Acceptance criteria: qCompress() encodes the length field such that

      1. old qUncompress(), interpreting it as a 32-bit signed BE field, continues to work
        • succeeds if it would have succeeded with the old format
      2. new code decodes the length field such that the resulting length is larger than the output data
        • but not by more than 100%
      3. if any of the above is unachievable, the fall-back is to minimize the number of required Z_BUF_ERROR rounds in qUncompress() (old and new versions)

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              cnn Qt Core & Network
              mmutz Marc Mutz
              Vladimir Minenko Vladimir Minenko
              Alex Blasche Alex Blasche
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes