Loading...

XML

Word

Printable

Type: Task
Resolution: Unresolved
Priority: P3: Somewhat important
Fix Version/s: None
Affects Version/s: None
Component/s: Core: QString and Unicode
Labels:
None

Platform/s:

All
Epic Link:
full-unicode
Commits:
0e67553aa (dev), 87748e110 (tqtc/lts-6.5), 541ff8e15 (dev), 55ea4ebe3 (6.9), f981baf3d (6.8)

Unicode 15.1 introduced rules for line breaking (https://www.unicode.org/reports/tr14/tr14-51.html#LB28a) and grapheme clustering (https://www.unicode.org/reports/tr29/#GB9c) for Indic scripts. Both those rules are not trivial to add to the existing code due to different form than the already supported rules. The grapheme clustering rule also requires extracting another Unicode data file (Indic_Conjunct_Break rule from DerivedCoreProperties.txt). Also, Qt already implements custom segmentation rules for some of the scripts in qunicodetools.cpp.
It would be good to adapt the segmentation code to use the Unicode data instead of the custom tables if possible. This would also allow using the Unicode test data for Indic scripts.

For now I'm going to disable parts of tst_qtextboundaryfinder that depend on those new rules.

Strangely, Unicode line breaking and grapheme clustering handle disjoint set of scripts. Line breaking seem to support Balinese, Javanese, Brahmi, Grantha, Dives and Kawi. InCB properties are set for Devanagari, Bengali, Gujarati, Oriya, Telugu and Malayalam (except for Extend property that is set for more scripts, some are not Indic/Brahmic).

Qt has special rules for Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam and Sinhala, so using only Unicode grapheme clustering would be a regression. There are also separate rules for Thai, Tibetan, Myanmar and Khmer.

resulted from

QTBUG-121529 Update UCD to version 32 (Unicode 15.1)

Closed

- Issue Only
- Show All Reviews
- Show Open Reviews

For Gerrit Dashboard: QTBUG-121907
#	Subject	Branch	Project	Status	CR	V
536606,9	tst_qttextboundaryfinder: ignore unsupported tests	dev	qt/qtbase	Status: MERGED	+2	0
625387,2	Re-enable some TextBoundaryFinder test data	dev	qt/qtbase	Status: MERGED	+2	+1
625402,2	tst_qttextboundaryfinder: ignore unsupported tests	tqtc/lts-6.5	qt/tqtc-qtbase	Status: MERGED	+2	0
637358,2	Re-enable some TextBoundaryFinder test data	6.9	qt/qtbase	Status: MERGED	+2	0
639740,2	Re-enable some TextBoundaryFinder test data	6.8	qt/qtbase	Status: MERGED	+2	0