Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-100234

Optimize our string searching and comparison code

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • P2: Important
    • None
    • None
    • None
    • String Search And Compare
    • 4411406d9 (dev)

    Description

      We still have several inefficiencies in our string searching and comparison functions. This is an attempt to catalogue them, so that we may finally get them resolved.

      As a general guiding principle, all search and comparison functions must eventually

      • be noexcept, that is, not have preconditions and not allocate memory, and
      • not do the same work twice.

      That means, in particular, no recoding of charsets (unless into a statically-sized buffer), and no repeated building of BM tables.

      There are some conflicting issues in this epic. E.g., in order to implement QTBUG-100239, we need dynamically-sized storage to hold the the case-folded needle, which conflicts with not allocating memory.

      My idea to reconcile these is as follows:

      1. if the needle is below the threshold for BM (currently, six characters, but we should centralize that), do the case-folding into a statically-sized buffer
      2. if the needle is above the threshold, then we obviously need to use BM, but we can't, in general, store the full case-folded needle. But
        • I think we can build the BM tables for the full needle, while storing only the first, say, 256 characters case-folded, in a statically-sized buffer
        • when matching, we therefore will only get candidates, which we then need to confirm by matching the tail end (past the first 256 characters, if needle is longer)
      3. While I would love to use the STL's boyer_moore_searcher, it's unfortunately not noexcept, so we need to write new private BM matchers, which should separate the table generation from the table use, so as to decouple the charset used for building BM tables from the one used to match against.

      Attachments

        For Gerrit Dashboard: QTBUG-100234
        # Subject Branch Project Status CR V

        Activity

          People

            cnn Qt Core & Network
            mmutz Marc Mutz
            Vladimir Minenko Vladimir Minenko
            Alex Blasche Alex Blasche
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes