Unicode 6.1.0 (original) (raw)

Released: 2012 January 31 (Announcement)

Version 6.1.0 has been superseded by the latest version of the Unicode Standard.

Unicode 6.1.0 is a minor version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.1.0. In the discussion below, Version 6.1.0 may be abbreviated as "Unicode 6.1" or "Version 6.1."


Contents of This Document

A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Unicode Character Database Changes
G. Unicode Standard Annex Changes

A. Summary

Version 6.1 of the Unicode Standard continues the Unicode Consortium's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have labels which are easier for systematic programmatic use. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate. Hangul algorithms were consolidated and restructured. Before, one had to examine four separate documents. Now, the information is consolidated in the core specification in Chapter 3, Conformance.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E U+26FA+U+FE0E/ TENT text style
26FA FE0F U+26FA+U+FE0F/ TENT emoji style
26FD FE0E U+26FD+U+FE0E/ FUEL PUMP text style
26FD FE0F U+26FD+U+FE0F/ FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages. The processing of Chinese data has been augmented by more fully specified information on mapping between Simplified and Traditional Chinese characters, in addition to other improved Unihan data that supports the processing of Chinese data.

For detailed property changes see Section F. Unicode Character Database Changes.

Version 6.1 has minor conformance updates, including the determination of grapheme cluster boundaries and the processing of combining canonical class and decomposition mapping. There are documentation improvements throughout.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1:

This version of the Unicode Standard is synchronized in repertoire with the forthcoming third edition of 10646: ISO/IEC 10646:2012.

B. Version Information

Version 6.1 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 6.1.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 6.1.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-02-3)
http://www.unicode.org/versions/Unicode6.1.0/

A complete specification of the contributory files for Unicode 6.1 is found on the pageComponents for 6.1.0.That page also provides the recommended reference format for Unicode Standard Annexes.

The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to thecode charts, theradical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 6.1 of the Unicode Character Database.

Code Charts

Several sets of code charts are available. They serve different purposes:

For Unicode 6.1.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.1 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 6.1, see the list of current Updates and Errata.

C. Stability Policy Update

The stability policy which limits the range of possible Canonical_Combining_Class property values was narrowed to 0..254, from its former range of 0..255. This has the effect of permanently reserving the value 255, which can then be used by implementations for possible optimizations of table building.

Note: TheUnicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.

D. Textual Changes and Character Additions

732 new character assignments were made to the Unicode Standard, Version 6.1. These additions bring the total number of characters assigned in the standard to 110,116. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)

Character Assignment Overview

128 characters have been added to the BMP, while 604 characters have been added in the supplementary planes. Most character additions are in new blocks, but there are also character additions to a number of existing blocks.

New Blocks

The newly-defined blocks in Version 6.1 are:

08A0..08FF Arabic Extended-A
1CC0..1CCF Sundanese Supplement
AAE0..AAFF Meetei Mayek Extensions
10980..1099F Meroitic Hieroglyphs
109A0..109FF Meroitic Cursive
110D0..110FF Sora Sompeng
11100..1114F Chakma
11180..111DF Sharada
11680..116CF Takri
16F00..16F9F Miao
1EE00..1EEFF Arabic Mathematical Alphabetic Symbols

Text Changes and Additions

Numbers indicate the chapter or section in the Unicode 6.1 core specification where there are some significant changes or additions. This list is not exhaustive. Select changes to conformance requirements in Chapter 3, Conformance, that impact implementations are listed separately underE. Conformance Changes.

E. Conformance Changes

There are several changes to conformance requirements in Unicode 6.1 that impact implementations. The most important of these are:

F. Unicode Character Database Changes

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.1 can be found in UAX #44, Unicode Character Database. The changes listed there include a number of important property revisions to existing characters that will affect implementations:

Other significant changes resulting from the addition of new characters include:

Other significant changes to the text of the core specification or annexes which may impact implementation include:

G. Unicode Standard Annex Changes

In Version 6.1, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9Unicode Bidirectional Algorithm No significant changes in this version.
UAX #11East Asian Width No significant changes in this version.
UAX #14Unicode Line Breaking Algorithm Rule 21a was added, to prevent a break between a Hebrew letter and a following hyphen, and added the character class HL (Hebrew Letter) for that rule. Small kana were moved from class NS to class ID, to align Japanese "kinsoku" more closely with CSS "normal" behavior.
UAX #15Unicode Normalization Forms An implementation note on the use of ccc=255 was added. The example code and description of Hangul decomposition and composition was moved into Section 3.12, Conjoining Jamo Behavior in the core specification. Section 14.1, Optimization Strategies was rewritten for clarity.
UAX #24Unicode Script Property The former Section 4.1 on Script Anomalies for East Asian Symbols was moved to become Section 3.6, and the examples were extended to cover additional unexpected script values for symbols. A description was added for the new property Script_Extensions.
UAX #29Unicode Text Segmentation The discussion of Hangul Syllable segmentation was moved from the Core Specification to this annex and its wording updated slightly. The handling of the Prepend and SpacingMark class was adjusted so that for the Thai and Lao scripts extended grapheme clusters behave like legacy grapheme clusters, as preferred. Characters with gc=Cs and gc=Cn were added to Control in Table 2, so that they do not join with following Extend characters for defining grapheme cluster boundaries.
UAX #31Unicode Identifier and Pattern Syntax New scripts were added to the tables categorizing script usage. Material was added to draw the distinction between the format of identifiers for internal use and the format of identifiers for display. Better guidance was provided on the use of variation sequences.
UAX #34Unicode Named Character Sequences No significant changes in this version.
UAX #38Unicode Han Database (Unihan) The kTotalStrokes and kMandarin fields were redefined. The use of the kTraditionalVariant and kSimplifiedVariant fields were clarified. A new section 4.4 was added, detailing the ranges of CJK ideographs covered by the Unihan database, with their associated Unicode age values. Each Unihan property that can have multiple values had a specification added to indicate whether the order of values matters, and if so, what the significance of that order is. The regex validity expressions were slightly simplified.
UAX #41Common References for Unicode Standard Annexes The references were updated as needed.
UAX #42Unicode Character Database in XML New values were added for the age, script, and jg attributes. The values for the ccc attribute were restricted to the 0..254 range, instead of 0..255. The patterns for kIRG_USource and kMandarin were updated to reflect changes in the Unihan database. A new element was added for the Name_Alias property, and new attributes were added for the Block and Script_Extensions properties. A clarification was added to distinguish attributes with empty string values from missing attributes. In particular, the absence of a numeric value is now represented by NaN. The value of the fc_nfkc attribute must now be either # or one-or-more-code-points.
UAX #44Unicode Character Database Text was added regarding the reserved value 255 for Canonical_Combining_Class. Grouped values for General_Category were added to the table of values for that property. The status and description of Grapheme_Base and Grapheme_Extend were updated. The tables of regular expressions for validation of property values were updated. An entry was added to the Property Table for the new Script_Extensions provisional property. The description of the Name_Alias property was updated. A new section describing multivalued properties was added. There are various other small editorial fixes to the text.

Access to Copyright and terms of use