Details
-
Task
-
Resolution: Unresolved
-
P2: Important
-
None
-
None
-
None
-
5
-
Foundation Sprint 124, Foundation Sprint 125, Foundation Sprint 126, Foundation Sprint 127, Foundation Sprint 128
Description
With UTF-8 being the mandated 8-bit encoding in Qt 6, we need to up our game when it comes to UTF-8 support. One area where we can improve is case-insensitive and case-sensitive UTF-8 search. We're not using Boyer-Moore for either or them, it seems.
Should be a nice research project, since I, at least, have no idea where to begin a case-insensitive UTF-8 Boyer-Moore algortihm, For Case-sensitive, we can probably use QByteArrayMatcher.
This is just about researching the state of the art in UTF-8 searching. There's always the obvious algorithm which parses the next MBS into a char32_t and looks up the hash in the skip table. But maybe there's something more clever (not necessarily BM-based).
Attachments
Issue Links
- is required for
-
QTBUG-100238 Add UTF-8 case-(in)sensitive Boyer-Moore searcher
-
- Reported
-