Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-103043

Research UTF-8 case-(in)sensitive Boyer-Moore search algorithms

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: P2: Important P2: Important
    • None
    • None
    • None
    • 5
    • Foundation Sprint 124, Foundation Sprint 125, Foundation Sprint 126, Foundation Sprint 127, Foundation Sprint 129

      With UTF-8 being the mandated 8-bit encoding in Qt 6, we need to up our game when it comes to UTF-8 support. One area where we can improve is case-insensitive and case-sensitive UTF-8 search. We're not using Boyer-Moore for either or them, it seems.

      Should be a nice research project, since I, at least, have no idea where to begin a case-insensitive UTF-8 Boyer-Moore algortihm, For Case-sensitive, we can probably use QByteArrayMatcher.

      This is just about researching the state of the art in UTF-8 searching. There's always the obvious algorithm which parses the next MBS into a char32_t and looks up the hash in the skip table. But maybe there's something more clever (not necessarily BM-based).

        1. uppersizetest.zip
          5 kB
          Matthias Rauter
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            matthias_rauter Matthias Rauter
            mmutz Marc Mutz
            Vladimir Minenko Vladimir Minenko
            Alex Blasche Alex Blasche
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                There are no open Gerrit changes