Unicode 5.2.0 (original) (raw)

Released: 2009 October 1 (Announcement)

The Unicode Standard Version 5.2.0 of the Unicode Standard consists of the core specification (The Unicode Standard, Version 5.2), together with the delta and archival code charts for this version, the 5.2.0 Unicode Standard Annexes, and the 5.2.0 Unicode Character Database (UCD). The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 5.2.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 5.2.0, defined by: The Unicode Standard, Version 5.2 (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9). (http://www.unicode.org/versions/Unicode5.2.0/)

A complete specification of the contributory files for Unicode 5.2.0 is found on the pageComponents for 5.2.0. That page also provides the recommended reference format for Unicode Standard Annexes.

Contents of This Document

A. Online Edition
B. Overview
C. Stability Policy Update
D. Character Additions
E. Conformance Changes
F. Unicode Character Database Changes
G. Unicode Standard Annex Changes

A. Online Edition

The text of The Unicode Standard, Version 5.2, as well as the delta and archival code charts, is available via the navigation links on this page. The charts and the Unicode Standard Annexes may be printed, while the other files may be viewed but not printed. The Unicode 5.2 Web Bookmarks page has links to all sections of the online text. A zipped version of the core specification (10 MB) is also available for download.

This page summarizes important changes to the standard from Unicode 5.1.0. The core specification and the Unicode Standard Annexes are not delta documents; they incorporate all of the textual changes for their updates for Version 5.2.0.

B. Overview

The Unicode Standard, Version 5.2, adds 6,648 characters and significantly improves the documentation of conformance requirements for the specification of normalization forms, canonical ordering, and the status of types of properties. Version 5.2 brings improved clarity of presentation in many Unicode Standard Annexes.

Seven new contemporary scripts have been added in Version 5.2: Bamum, Javanese, Lisu, Meetei Mayek, Samaritan, Tai Tham, and Tai Viet. New character additions to existing scripts now provide greater support for Abkhaz, Canadian Aboriginal Syllabics, Coptic, Devanagari, Khamti Shan, Malayalam, and Myanmar. Of particular note are Devanagari additions in support of Vedic Sanskrit. Encoding Vedic is significant because Sanskrit is one of the principal languages for the religious heritage of India, and because Vedic represents the earliest attested phase of the language.

The seven contemporary scripts and newly encoded individual characters expand support of language and orthographic communities in Africa, India, China, Central Asia, Southeast Asia, and the Middle East.

Other character additions include important modern use symbols and historic characters. With Unicode Version 5.2, scholars will now have access to the Gardiner set of Egyptian Hieroglyphs as well as other important historic scripts: Imperial Aramaic, Avestan, Kaithi, Old South Arabian, and Old Turkic. Several key symbol sets were added or expanded: the ARIB set of Japanese broadcasting symbols, additional number forms used in India, and currency symbols.

This latest version of the Unicode Standard has exactly the same character assignments as ISO/IEC 10646:2003 plus Amendments 1 through 6.

Unicode Version 5.2:


Errata incorporated into Unicode 5.2.0 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 5.2.0, see the list of current Updates and Errata.

C. Stability Policy Update

The Unicode Character Encoding Stability Policy has been updated. This update strengthens normalization stability, adds stability policy for case pairs, and extends constraints on property values. For the current statement of these policies, see Unicode Character Encoding Stability Policy.

D. Character Additions

6,648 new character assignments were made to the Unicode Standard, Version 5.2.0 (over and above what was in Unicode 5.1.0). The character repertoire corresponds to ISO/IEC 10646:2003 plus Amendments 1 through 6.

The exact list of characters added for Version 5.2.0 is documented in the file DerivedAge.txt in the Unicode Character Database. Among the characters added, there are a few notable cases which may impact existing implementations. These cases are highlighted here, so that implementers can check for any problematical assumptions in their code.

Character Assignment Overview

The new character additions were to both the BMP and the SMP (Plane 1). The following table shows the allocation of code points in Unicode 5.2.0. For more information on the specific characters, see the file DerivedAge.txt in the Unicode Character Database. For more details of character counts, see Appendix D, Changes from Previous Versions in Unicode 5.2.

Graphic 107,154
Format 142
Control 65
Private Use 137,468
Surrogate 2,048
Noncharacter 66
Reserved 867,169

E. Conformance Changes

There are several changes to conformance requirements in Unicode 5.2 that impact implementations. The most important of these are noted specifically here.

F. Unicode Character Database Changes

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 5.2.0 can be found in UAX #44, Unicode Character Database. The most significant changes include:

G. Unicode Standard Annex Changes

In Version 5.2, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

