Unicode 3.0.0 (original) (raw)
Version 3.0.0 has been superseded by thelatest version of the Unicode Standard.
Version 3.0.0 of the Unicode Standard consists of the core specification, The Unicode Standard, Version 3.0, the code charts for this version (currently only available in hard copy), five Unicode Technical Reports, and the 3.0 Update of the Unicode Character Database (UCD). The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Technical Reports supply detailed information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard. A complete specification of the contributory files for Unicode 3.0.0 is found on the page Components for 3.0.0. That page also provides the recommended reference format for this version of the Unicode Standard.
Online Edition
The text of The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5) is available online via the navigation links on this page, with the exception of the code charts and the Han radical-stroke indices. A slightly modified HTML version of Chapter 1 has also been provided. Printing from the PDF files has been disabled. Normative references to the Unicode Standard, Version 3.0 should use the printed edition.
Overview
Unicode 3.0.0 is a major version of the Unicode Standard and supersedes all previous versions. This page summarizes the important changes for the Unicode Standard, Version 3.0.0. In the discussion below, shortened references to "Unicode 3.0" or "Version 3.0" specifically refer to Version 3.0.0.
The core specification, The Unicode Standard, Version 3.0 contains descriptions and properties for many new characters. It is synchronized with ISO/IEC 10646-1 second edition. The text of the standard has been extensively rewritten to improve its structure and clarity.
Unicode 3.0 also includes enhanced implementation guidelines, and has been reorganized to describe related scripts within separate chapters. In addition to new characters, there are significant clarifications or modifications to character semantics from Unicode 2.0 to Unicode 3.0.
The vast majority of implementations of earlier versions will be conformant to Unicode 3.0.0 once the character properties for their supported characters are updated to Version 3.0.0 of the Unicode Character Database.
The most significant additions to the standard include the following:
- Transformation Formats. The precise definitions of the common Unicode Transformation Formats are provided, including UTF-8, UTF-16, UTF-16BE, and UTF-16LE. The relations between abstract characters, code points (scalar values) and code units (8, 16 or 32 bit) are clarified.
- Bidirectional properties. Bidirectional properties are now more consistent with the general category property, and new bidirectional properties were created. SeeUTR #9: The Bidirectional Algorithm.
- Case. Case properties have been extended for those situations where there is a mapping to multiple characters and where case is locale dependent.
- Combining classes. These were updated significantly to resolve problems of normalization and decomposition for Indic scripts in particular.
- Decomposition and Composition. Unicode character decompositions have been significantly updated to fix errors in the original assignments, to allow correct collation weighting, and to make decompositions consistent for normalization. Certain characters are excluded from composition, and the precise algorithm for composition is provided. SeeUTR #15: Unicode Normalization Forms.
- General Category. A series of general category changes were made to assist the convergence of the Unicode definition of identifier with ISO TR 10176.
- Newlines. Line handling characteristics have been documented more fully for Unicode environments. SeeUTR #13: Unicode Newline Guidelines.
- Quotation Marks. Two new punctuation categories, Pi and Pf, were created for initial and final quotes with properties that vary by language.
- Linebreak properties. Linebreaking properties (normative and informative) are added to the standard to support consistent linebreaking behavior over all Unicode characters. SeeUTR #14: Line Breaking Properties.
- East-Asian width properties. Properties for supporting correct choice of full-width vs. half-width glyphs in an East Asian context are provided. SeeUTR #11: East Asian Width.
- Specific Characters:
- The use of the byte order mark with transformation formats is clarified.
- Use of line and paragraph separators is clarified.
- Capital letters with iota adscript. The representative glyphs, semantics, case mappings and decompositions have been revised to make their handling more consistent.
- Consonant RA rules have been updated and expanded to cover Eyelash Ra.
- U+2007 FIGURE SPACE is no longer treated like a numeric separator for purposes of bidirectional layout.
- The description of layout controls was enhanced to include the behavior of U+00A0 NO-BREAK SPACE, U+00AD SOFT HYPHEN, and zero-width spaces.
- The use of U+007E TILDE as a spacing clone of combining tilde and as a regular character is described more completely.
New Characters
The new characters added to Unicode 3.0 are summarized in the following table:
Unicode 3.0 Summary
Category V 2.1 V 3.0 Alphabetics, Symbols 6511 10236 CJK Ideographs 21204 27786 Hangul Syllables 11172 11172 Total assigned characters 38887 49194 Private Use 6400 6400 Surrogates 2048 2048 Controls 65 65 Not Characters 2 2 Total assigned 16-bit code values 47402 57709 Unassigned 16-bit code values 18134 7827 Besides adding characters to existing blocks, Unicode 3.0 adds a number of new blocks, listed below, and including the number of code points allocated to each block. For a list of all the blocks in Unicode 3.0, see Blocks.txt
New Blocks
Number Block Name 80 Syriac 192 Thaana 128 Sinhala 160 Myanmar 384 Ethiopic 96 Cherokee 640 Unified Canadian Aboriginal Syllabics 32 Ogham 96 Runic 128 Khmer 176 Mongolian 256 Braille Patterns 128 CJK Radicals Supplement 224 Kangxi Radicals 16 Ideographic Description Characters 32 Bopomofo Extended 6582 CJK Unified Ideographs Extension A 1168 Yi Syllables 64 Yi Radicals Conformance Changes
Conformance clauses, definitions, and explanatory text were added for handling Unicode Transformation Formats. The Unicode Bidirectional Behavior algorithm rules were clarified and expanded, and new bidirectional character properties were documented. Other normative character property values were changed; see the Unicode character database file for more information.
Unicode Technical Reports
The following technical reports are approved and considered part of the Unicode Standard, Version 3.0. These reports may contain either normative or informative material, or both. Any reference to version 3.0 of the standard automatically includes these technical reports.
- UTR #9: The Bidirectional Algorithm, Version 6.0
- UTR #11: East Asian Width, Version 5.0
- UTR #13: Unicode Newline Guidelines, Version 5.0
- UTR #14: Line Breaking Properties, Version 6.0
- UTR #15: Unicode Normalization Forms, Version 18.0