Unicode 6.3.0 (original) (raw)

Released: 2013 September 30 (Announcement)

Version 6.3.0 has been superseded by the latest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 6.3.0.

The core specification was not republished for Version 6.3. Thus the chapters of the core specification use the Version 6.2.0 PDF files.

A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards

A. Summary

Version 6.3 of the Unicode Standard is a special release focused on delivering significantly improved bidirectional behavior.

Bidirectional Behavior Improvements

This new version updates the Unicode Bidirectional Algorithm to ensure that pairs of parentheses and brackets have consistent layout and to provide a mechanism for isolating runs of text.

The updated Bidirectional Algorithm together with five newly introduced bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements bring greater interoperability and an improved ability for inserting text and assembling user interface elements in these languages.

The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.

Other Enhancements

In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs—that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.

Version 6.3 includes other improvements as well:

This version also rolls in a change in Definition D136 (case-ignorable) of the core specification, various minor corrections for errata, and other small updates for the Unicode Character Database.

Synchronization

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.3:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of 5 bidirectional format control characters: U+061C ARABIC LETTER MARK and the isolate span controls U+2066..U+2069.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Version Information

Version 6.3 of the Unicode Standard consists of the core specification (unchanged from Version 6.2, except for Definition D136), the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 6.3.0 of the Unicode Standard should be referenced as:

The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
http://www.unicode.org/versions/Unicode6.3.0/

The terms “Version 6.3” or “Unicode 6.3” are abbreviations for the full version reference, Version 6.3.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium. The Unicode Standard.
http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 6.3 is found on the pageComponents for 6.3.0. That page also provides the recommended reference format for Unicode Standard Annexes.

The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to thecode charts, theradical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 6.3 of the Unicode Character Database.

Code Charts

Several sets of code charts are available. They serve different purposes:

For Unicode 6.3.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.3 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 6.3, see the list of current Updates and Errata.

C. Stability Policy Update

The statement of the stability policy for the Bidi_Class property was slightly reworded to clarify the exact type of changes allowed for it. This update is related to the changes in Unicode 6.3.0 for the Unicode Bidirectional Algorithm.

A constraint was added for the new Bidi_Paired_Bracket_Type (bpt) property, to guarantee that characters given either bpt=Open or bpt=Close (intended to be limited to paired brackets) also have Bidi_Class=ON and Bidi_Mirrored=Yes, for consistency.

A new constraint was added to guarantee that characters with the General_Category property value Number also have a Numeric_Type property value distinct from None.

For details about each of these changes or additions, see Property Value Stability.

Note: The Unicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.

D. Textual Changes and Character Additions

In Version 6.3 of the core specification, Section 3.13, Default Case Algorithms, Definition D136 has been updated as follows:

D136. A character C is defined to be case-ignorable if C has the value MidLetter (ML), MidNumLet (MB), or Single_Quote (SQ) for the Word_Break property or its General_Category is one of Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).

Changes in the Unicode Standard Annexes are listed in Section G.

Character Assignment Overview

Five new character assignments were made for the Unicode Standard, Version 6.3, as shown in the following table. This addition brings the total number of characters assigned in the standard to 110,122. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)

U+061C ARABIC LETTER MARK
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE

No new blocks are defined in Version 6.3.

E. Conformance Changes

In Version 6.3 of the core specification, the derivation of the property Case_Ignorable in Definition D136 has been updated to account for the change in the Word_Break property value of U+0027 APOSTROPHE from MidNumLet to Single_Quote.

Except for the update to Definition D136, there are no significant conformance changes in the core specification. However, there are significant conformance changes to the Unicode Bidirectional Algorithm in UAX #9, which may also affect incidental discussion about the Unicode Bidirectional Algorithm in several sections of the core specification.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.3 can be found in UAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. The most notable changes are summarized below.

Miscellaneous Changes

G. Changes in the Unicode Standard Annexes

In Version 6.3, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex Changes
UAX #9Unicode Bidirectional Algorithm The Unicode Bidirectional Algorithm was substantially extended to support isolate runs and to resolve paired brackets as a unit. For the former extension, four new Bidi_Class property values were added. For the latter, two normative properties and an algorithm rule N0 were introduced. Additional definitions, rule revisions, notes, and examples were included, and a new test file was added.
UAX #11East Asian Width No significant changes in this version.
UAX #14Unicode Line Breaking Algorithm The description of the CM class was updated to reflect a refinement in line breaking for U+3035 VERTICAL KANA REPEAT MARK LOWER HALF, and the description of the BA class was updated to reflect a change for U+3000 IDEOGRAPHIC SPACE.
UAX #15Unicode Normalization Forms No significant changes in this version.
UAX #24Unicode Script Property No significant changes in this version.
UAX #29Unicode Text Segmentation There were some minor updates made for word segmentation. Apostrophe and double quote are now allowed within a strictly Hebrew word context, to reflect their common use in place of geresh and gershayim.
UAX #31Unicode Identifier and Pattern Syntax No significant changes in this version.
UAX #34Unicode Named Character Sequences No significant changes in this version.
UAX #38Unicode Han Database (Unihan) The status of kCompatibilityVariant was clarified. kHanyuPinlu was changed to use accents instead of numbers for tones, and the regular expression for it was modified accordingly. Many other minor documentation updates were made.
UAX #41Common References for Unicode Standard Annexes Minor updates were made to the references.
UAX #42Unicode Character Database in XML Changes were made to track additional properties and property values for the Unicode Bidirectional Algorithm.
UAX #44 Unicode Character Database The status of default values was clarified. Numerous changes were made to reflect changes to the Unicode Bidirectional Algorithm and its associated character properties and data files. A clarification was added about Numeric_Type=Digit.
UAX #45 U-Source Ideographs 245 characters were added to the list of U-Source ideographs. A new status of UNC-2013 was added and documented.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical Standard Changes
UTS #10Unicode Collation Algorithm The CLDR root collation data files contained in CollationAuxiliary.zip, along with the related documentation, have been moved from the UCA release directory to the root collation data files in the CLDR repository. Trailing collation elements are now given regular tertiary weights in DUCET, which allows for full case differences among compatibility characters. Digits from all scripts are now given the same weights as ASCII digits in DUCET, rather than being distinguished by secondary weights. The IgnoreSP option for handling variables (intended for ignoring punctuation but not symbols) has been removed. The weights 0xFFFD..0xFFFF are now reserved for special collation elements. In addition, the text of UTS #10 has been reorganized for better flow.
UTS #46Unicode IDNA Compatibility Processing The five new bidirectional format controls were added. They are given the value ignored in IdnaMappingTable.txt. They have the status disallowed in IDNA2008.

Access to Copyright and terms of use