Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-103043

Research UTF-8 case-(in)sensitive Boyer-Moore search algorithms

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • P2: Important
    • None
    • None
    • None

    Description

      With UTF-8 being the mandated 8-bit encoding in Qt 6, we need to up our game when it comes to UTF-8 support. One area where we can improve is case-insensitive and case-sensitive UTF-8 search. We're not using Boyer-Moore for either or them, it seems.

      Should be a nice research project, since I, at least, have no idea where to begin a case-insensitive UTF-8 Boyer-Moore algortihm, For Case-sensitive, we can probably use QByteArrayMatcher.

      This is just about researching the state of the art in UTF-8 searching. There's always the obvious algorithm which parses the next MBS into a char32_t and looks up the hash in the skip table. But maybe there's something more clever (not necessarily BM-based).

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              cnn Qt Core & Network
              mmutz Marc Mutz
              Vladimir Minenko Vladimir Minenko
              Alex Blasche Alex Blasche
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes