Unicode 14.0.0 (original) (raw)

2021 September 14 (Announcement)

Version 14.0.0 has been superseded by the latest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 14.0.0. This version supersedes all previous versions of the Unicode Standard.

A. Summary
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration

A. Summary

Unicode 14.0 adds 838 characters, for a total of 144,697 characters. These additions include 5 new scripts, for a total of 159 scripts, as well as 37 new emoji characters.

The new scripts and characters in Version 14.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:

Popular symbol additions:

Other symbol and notational additions include:

Support for CJK unified ideographs was enhanced in Version 14.0 by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated source lists, regular expressions, and new and updated fields. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Additional support for lesser-used languages and scholarly work was extended, including:

Important chart font updates, including:

Synchronization

Several other important Unicode specifications have been updated for Version 14.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 14.0:

Some of the changes in Version 14.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Technical Overview

Version 14.0 of the Unicode Standard consists of:

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Core Specification

The core specification is available as a single pdf for viewing. (14 MB) Links are also available in the navigation bar on the left of this page to accessindividual chapters and appendices of the core specification. It is also available as Print-on-Demand (POD) for purchase: Volume 1 and Volume 2.

Code Charts

Several sets of code charts are available. They serve different purposes:

For Unicode 14.0.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

The old, frozen UCS2003 source column has been removed from the multi-column display for CJK Unified Ideographs Extension B for Version 14.0.0. For permanent reference, asingle source display of UCS2003 (8.7 MB) for the CJK Unified Ideographs Extension B has been provided as part of the Version 13.0.0 archival charts.

Unicode Standard Annexes

Links to the individual Unicode Standard Annexes are available in the navigation bar on the left of this page. The list of significant changes in the content of the Unicode Standard Annexes for Version 14.0 can be found in Section G below.

Unicode Character Database

Data files for Version 14.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories.Zipped versions of the UCD for bulk download are available, as well.

Version References

Version 14.0.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 14.0.0, (Mountain View, CA: The Unicode Consortium, 2021. ISBN 978-1-936213-29-0)
http://www.unicode.org/versions/Unicode14.0.0/

The terms “Version 14.0” or “Unicode 14.0” are abbreviations for the full version reference, Version 14.0.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 14.0 is found on the page Components for 14.0.0. That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.

Errata

Errata incorporated into Unicode 14.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 14.0, see the list of current Updates and Errata.

C. Stability Policy Update

There were no significant changes to the Stability Policy of the core specification between Unicode 13.0 and Unicode 14.0.

D. Textual Changes and Character Additions

Five new scripts were added with accompanying new block descriptions:

Script Number ofCharacters
Vithkuqi 70
Old Uyghur 26
Cypro-Minoan 99
Tangsa 89
Toto 31

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

838 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.

E. Conformance Changes

There are no significant new conformance requirements in Unicode 14.0.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 14.0 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in Section M.

G. Changes in the Unicode Standard Annexes

In Version 14.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Note that for Unicode 14.0, all pertinent links to URLs on the Unicode website in these Unicode Standard Annexes were updated to use the https protocol.

Unicode Standard Annex Changes
UAX #9Unicode Bidirectional Algorithm Section 6.2, Vertical Text was clarified to indicate how the Bidirectional Algorithm is (or is not) used when text is laid out in vertical orientation.
UAX #11East Asian Width No significant changes in this version.
UAX #14Unicode Line Breaking Algorithm One redundant rule part was removed from LB27 in Section 6.1, Non-tailorable Line Breaking Rules. Also, LB30b was updated to include potential emoji.
UAX #15Unicode Normalization Forms No significant changes in this version.
UAX #24Unicode Script Property No significant changes in this version.
UAX #29Unicode Text Segmentation A Swedish "AIK:are" example was added to the word boundary discussion. The description of the charts in the auxiliary data files was updated, to make it more accurate. Other small editorial fixes were applied to the text.
UAX #31Unicode Identifier and Pattern Syntax Scripts new to Unicode 14.0 were added to the appropriate tables. A new Section 1.5, Notation, was added, referring to the LDML for the UnicodeSet notation used in this annex.
UAX #34Unicode Named Character Sequences No significant changes in this version.
UAX #38Unicode Han Database (Unihan) The kCantonese field was redefined, and its description was updated accordingly. The new kStrange field was added. Regular expressions, source lists, and descriptions were updated for many other fields.
UAX #41Common References for Unicode Standard Annexes All references were updated for Unicode 14.0.
UAX #42Unicode Character Database in XML New code point attributes, values, and patterns were added for Unicode 14.0.
UAX #44 Unicode Character Database The documentation was updated to describe the changes to the UCD for Version 14.0. The distinction between properties of strings and string-valued properties was clarified. A note was added clarifying that Vertical_Orientation defaults to U in some blocks associated with notational systems. An erroneous statement about which General_Category values can be associated with ccc≠0 was corrected.
UAX #45 U-Source Ideographs Descriptions were added for new data fields (total strokes and first residual stroke) in the data file associated with UAX #45. The KangXi dictionary index field was obsoleted. New information was added about the submission process.
UAX #50 Unicode Vertical Text Layout No significant changes in this version.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes
UTS #10Unicode Collation Algorithm No significant changes in this version.
UTS #39Unicode Security Mechanisms Section 3, Identifier Characters was adjusted to better introduce the topic of identifiers. The text in Section 3.1, General Security Profile for Identifiers was clarified regarding the rationales for restricting a character. The descriptions of identifier types in Table 1 were clarified.
UTS #46Unicode IDNA Compatibility Processing No significant changes in this version.
UTS #51Unicode Emoji The introduction was reworded. The definition of Basic_Emoji was clarified, and it was noted that emoji sets are binary properties of strings. In Section 2.6.2, Multi-Person Skin Tones, the handshake was added to the list of emoji with RGI skin tones.

M. Implications for Migration

There are a significant number of changes in Unicode 14.0 which may impact implementations upgrading to Version 14.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.

Five new scripts have been added in Unicode 14.0.0. Some of these scripts have particular attributes which may cause issues for implementations. The more important of these attributes are summarized here.

Casing Issues

Numeric Property Issues

CJK/Unihan Changes

See UAX #38, Unicode Han Database (Unihan) for further details on these changes, especially Section 4.2, Listing by Date of Addition to the Unicode Standard, and Section 4.3, Listing by Location within Unihan.zip. UAX #38 also has updated regex values for numerous Unihan properties.

Emoji Changes

Code Charts

The old, frozen UCS2003 source column has been removed from the multi-column display for CJK Unified Ideographs Extension B for Version 14.0.0. For permanent reference, asingle source display of UCS2003 (8.7 MB) for the CJK Unified Ideographs Extension B has been provided as part of the Version 13.0.0 archival charts. The rationale for this change is that the UCS2003 source was the source corresponding to the single column chart first printed in Unicode 4.0 in 2003. The glyphs for that single source had not tracked the extensive updates for characters in Extension B over the intervening years, and so in some cases were becoming misleading about the identity of some of the corrected characters in Extension B.


Access to Copyright and terms of use