Update UCD for Unicode 15.0



      Run the util/unicode/ regenerator, check the results work.


      A copy of the summary for convenience from https://www.unicode.org/versions/Unicode15.0.0/


      Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters. These additions include 2 new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.

      The new scripts and characters in Version 15.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:

      • Nag Mundari, a modern script used to write Mundari, a language spoken in India
      • A Kannada character used to write Konkani, Awadhi, and Havyaka Kannada in India
      • Kaktovik numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the counting systems of the Inuit and Yupik languages

      Popular symbol additions:

      • 20 emoji characters, including hair pick, maracas, jellyfish, khanda, and pink heart. For complete statistics regarding all emoji as of Unicode 15.0, see Emoji Counts. For more information about emoji additions in version 15.0, including new emoji ZWJ sequences and emoji modifier sequences, see Emoji Recently Added, v15.0.

      Other symbol and notational additions include:

      • The nine pointed white star symbol, used by members of the Bahá’í faith
      • Eight symbols for celestial bodies, used by astronomers and astrologers
      • Twenty-nine additional Egyptian hieroglyph format controls, which will enable Egyptologists to better represent texts

      Support for other languages and scholarly work worldwide includes:

      • Kawi, a historical script found in Southeast Asia, used to write Old Javanese and other languages
      • Three additional characters for the Arabic script to support Quranic marks used in Turkey
      • One new Lao sign used to write Lao Pali
      • Three Khojki characters found in handwritten and printed documents
      • Ten Devanagari characters used to represent auspicious signs found in inscriptions and manuscripts
      • Six Latin letters used in Malayalam transliteration
      • Sixty-three Cyrillic modifier letters used in phonetic transcription
      • One additional Egyptian hieroglyph

      Updates to the CJK blocks add:

      • 4,192 ideographs in the new CJK Unified Ideographs Extension H block
      • One ideograph in the CJK Unified Ideographs Extension C block

      Support for CJK unified ideographs was enhanced in Version 15.0 by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated source lists, regular expressions, and new and updated fields. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

      Important chart font updates, including:

      • A set of updated glyphs for Egyptian hieroglyphs, in addition to standardized variation sequences to support rotated glyphs found in texts
      • Improved glyphs for Unified Canadian Aboriginal Syllabics, which provide better support for Carrier and other languages
      • A new Wancho font, with improved and simplified shapes


      Several other important Unicode specifications have been updated for Version 15.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 15.0:

      Some of the changes in Version 15.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

      See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.



