Details
-
Epic
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.7
-
None
-
Generic Format Descriptors
-
Description
We presently have localized formatting for some of the types classified below, but most of the number formats are conflated or done wrong. In some cases this formatting takes into account localised choice of digit set [*]; at least for date-time parsing this is not reliably handled.
[*] We specify digit set by recording CLDR's zero digit; this is not robust (there are two traditional Chinese number systems that share a common zero). It would be better to have an enumeration for number systems and a separate CLDR-derived mapping from these to their digit sets. This could be done with only moderate reworking of QLocale and its CLDR scripts.
In all cases, certain delimiters appear in particular positions in relation to data-derived content – potentially-empty sequences of digits, abbreviations for or names of days, months, currencies, units – and there may be constraints on these (notably lengths of sequences of digits). Serializing or parsing based on such a pattern has large amounts of common structure, much of which may be better centralised to a common infrastructure, on which the serializers and parsers for various datatypes can be built.
Such a shared internal infrastructure may, potentially, provide a path towards supporting user-supplied formats to be used for custom formatting of particular data, or as parts of a user-configured locale description.
Types of data for which we could potentially use such format descriptions:
- Numbers
- Plain numbers:
- Whole (with choice of base, two through thirty-six)
- Floating-point (with choice of precision, width, style; but not base – there's a case for hex)
- Percentages
- Currency amounts (vernacular or accountancy; positive or negative)
- Amounts with (non-currency) units of measurement
- We currently support byte amounts, but do it all wrong.
- SI (kg, m, s) vs USA (g, cm, s) units; or "imperial" (US vs UK).
- All choices relevant to floating apply, plus short or long form for units.
- Leave client code to supply unit name or abbreviation along with the number.
- Plain numbers:
- Temporal
- Dates, Times, Date-times
- Time-zones - IANA names, MS names, Offset form (with/out colon), abbreviation
- Lists
- Enclosures, separators
- Colour
- HSV vs CMYK vs RGB vs …
- Hex, decimal tuples, names.
- IP addresses: IPv4 (hex vs dotted-decimal), IPv6 (standard hex)
- …
We should not necessarily try to cover all these cases, but we should identify those we do want to cover and do so coherently, integrating with our string-formatting (traditionally support for "%L1" and kindred; potentially some future std::format support). We should at least have an eye on ensuring the way we do this for the ones we do chose to support is compatible with later extending to others that we have chosen to leave aside (at least for the present).
Our present format descriptions (at least for temporal data) lack the expressive power of CLDR's representations of formats and there are "impedance mismatches" between them that mean conversions to and fro are not faithful. Where possible, using CLDR's formats (or formats equivalent to them in expressiveness, with a well-defined transformation to and from them) is preferable to inventing our own formatting systems and then kludging a mapping from CLDR's to ours.
Attachments
Issue Links
- relates to
-
QTBUG-104651 Impact of C++20 std::format on our code
- In Progress
-
QTBUG-82886 Qt::DefaultLocaleShortDate / QLocale::ShortFormat date parsing doesn't deal with 4-digit years
- Open
-
QTBUG-110669 Q/Date/Time: align unquoting format strings for both from/toString()
- Open