Unicode 1.1 (original) (raw)

Version 1.1 has been superseded by thelatest version of the Unicode Standard.

Version 1.1 of the Unicode Standard consists of the core specification, The Unicode Standard, Version 1.0 (Volume 1 and Volume 2), as modified by Unicode Technical Report #4,The Unicode Standard, Version 1.1 and the 1.1 Update of the Unicode Character Database (UCD). The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

A complete specification of the contributory files for Unicode 1.1 is found on the page Components for 1.1.0. An updated specification, including the Version 1.1.5 Unicode Character Database is found on the page Components for 1.1.5.

Online Edition
This online edition of The Unicode Standard, Version 1.1 provides a digital archive of this early version of the Unicode Standard, for historical purposes. To view the text, follow the navigation links on this page. Although this version is out of print, normative references to the Unicode Standard, Version 1.1 should continue to use the printed edition.

Because the pdf files are scanned from hard copy, they are larger than the pdf files for versions which were produced directly from the editing tools (starting with Version 3.0). Sizes of the individual chapters and other files are not individually noted, but some of them are large—in particular, Appendix I, Unicode 1.1 Character List.

Overview
Unicode 1.1 is a minor version of the Unicode Standard published in 1993. It supersedes all previous versions.

Unicode Technical Report #4, The Unicode Standard, Version 1.1, contains descriptions and properties for many additional characters beyond the repertoire documented in the core specification for Unicode 1.0. The repertoire and code points for Version 1.1 are synchronized with ISO/IEC 10646-1:1993.

Unicode 1.1.5 is nominally an update version of the Unicode Standard, augmented by the first release of machine-readable data files corresponding to the repertoire for Unicode 1.1. The UnicodeData.txt data file was first made available in July, 1995, so that is taken as the date of record for Unicode 1.1.5. At the time, that data file was considered an "add on" to Unicode 1.1, rather than a formal superseding of the Unicode 1.1.0 version, because no prior UnicodeData.txt file had existed for it to supersede. All citations at the time were simply made to "Unicode 1.1"; the formal mechanism of update versions to the standard was first fully spelled out as of Unicode 2.0.0.

Historical Text Notes
Unicode 1.1—and the publication of Unicode Technical Report #4 to document it—were necessary to account for the synchronization of the Unicode Standard with ISO/IEC 10646-1:1993. Even as the Unicode Consortium was completing its initial publication of Unicode 1.0 during 1991 and 1992, there was a wide-ranging and contentious set of negotiations underway to effect the convergence of architecture and repertoire between the Unicode Standard and the International Standard 10646-1. As of 1991, ISO/IEC 10646-1 had failed a crucial ballot, and a new ballot was prepared for that project, which took all of Unicode 1.0 on board—including the critical architectural feature of a unified encoding of the Han script. The Unicode Consortium undertook plans to issue a new version of its standard to match the outcome of the balloting on ISO/IEC 10646-1, so that when ISO/IEC 10646-1:1993 was finally published, there would be a corresponding version of the Unicode Standard published, as well. This process and agreement were known at the time as the "Unicode/10646 merger".

In 1993, the urgency to complete publication of a new version of the Unicode Standard, coupled with the challenge of dealing with the major reorganization and extension of the code charts it required, led the Unicode Consortium to adopt an interim strategy of issuing a minor version publication which simply documented all of the known changes and additions for Unicode 1.1, rather than attempting a full republication of a major version of the standard in book form. That latter project took three more years, and eventually led to the publication of Unicode 2.0 in 1996.

In part because of the urgency to publish Unicode Technical Report #4, it was structured basically as a set of release notes, detailing various changes and additions, with a large names list appendix simply listing all the characters and their code points, without showing glyphs in code charts. Despite this ad hoc structure, UTR #4 was an extremely important reference document for implementers during the timeframe of 1993 to early 1996, when it was finally superseded by Unicode 2.0. UTR #4 was the only available comprehensive specification of the Unicode 1.1 content, which differed in many significant respects from Unicode 1.0. Only a relatively small number of copies of UTR #4 were ever produced. They were distributed directly to Unicode Consortium members at the time, and to a few other implementers and standardizers who requested copies. UTR #4 quickly went out of print, once the Unicode 2.0 book was published. It has never before been available online, until the preparation of this historical Online Edition as of 2015.

The detailed text notes below aim at providing a historical context for the various chunks of text which appear in Unicode Technical Report #4.

Chapter 2.0, Changes in Unicode 1.0, is derived directly from Sections 1 through 6 and Section 9 of the Unicode 1.0.1 Notice, which provided the final details of the block changes, character deletions, and character moves resulting from the Unicode/10646 merger. The text of this chapter then became the seed for the text of Section D.2, Changes from Unicode 1.0 to 1.1, in Unicode 2.0.

In Chapter 3.0, New Character Semantics, Section 3.2, Zero-Width Joining and_Section 3.3, Byte Order Mark_, are derived directly from Section 7, Character semantics changed, of the Unicode 1.0.1 Notice.Chapter 4.0, Additional Character Semantics and Chapter 5.0, Conjoining Korean Jamos constitute new content in 1.1. The text of these chapters was then distributed into the relevant block descriptions ofUnicode 2.0.

Appendix A, Errata documented known errata in the text of Volume 2 ofUnicode 1.0. These errata were simply corrected as the corresponding text, figures, or glyphs were rolled forward into Unicode 2.0.

Appendix B, Han Compatibility Mappings documented a systematic off-by-one error in the table of mappings for the CJK Compatibility Ideographs block printed in Volume 2 of Unicode 1.0. It also provided source information about the CJK Compatibility Ideographs block that was carried forward into the block description inUnicode 2.0.

Appendix C, Implementing Korean Jamos discussed various implementation issues related to the new conjoining jamos in 1.1, including collation, rendering, keyboard input, and application compatibility. Most of this text was either extensively rewritten or dropped later, but some small sections were carried forward into Unicode 2.0.

Appendix D, Canonical Ordering Priorities was the first publication of the numerical values for what much later came to be known as the Canonical_Combining_Class property used by the Unicode Normalization Algorithm. This tabular data was moved into Section 4.2, Combining Classes inUnicode 2.0, after extensive modifications were made to move Indic dependent vowels from "fixed position classes" to combining class 0.

Appendix E, Block Names provided a formal listing of all the block ranges and block names in 1.1. It was turned into a data file forUnicode 2.0. See Blocks-1.txt.

Appendix F, FSS-UTF was the statement of the precursor transformation format which was supplanted by UTF-8. UTF-8 was specified in Amendment 1 to ISO/IEC 10646-1:1993 and was quickly adopted by the Unicode Standard to replace FSS-UTF.

Appendix G, Symmetric Swapping Characters was the list of "symmetric characters as defined by 10646." In effect, this was the list of characters subject to mirroring as a result of what later became the Unicode Bidirectional Algorithm. This list was carried into Section 4.7, Mirrored of Unicode 2.0.

Appendix H, New Characters started with Section 8 of the Unicode 1.0.1 Notice, but was then filled out with a complete listing of all other additions, including an explicit list of the added conjoining jamo for Korean and the long list of APL functional symbol additions. The text of this appendix then became the seed for the text of the New Characters Added subsection of Section D.2, Changes from Unicode 1.0 to 1.1, in Unicode 2.0.

Appendix I, Unicode 1.1 Character List presented the list of all Unicode 1.1 character names and decomposition information, but showing no representative glyphs, in an idiosyncratic three column tabular format. That format was simply a compromise to allow quick publication. It was replaced with the familiar Unicode code charts format inUnicode 2.0.

Redistribution of Text from 1.1 to 2.0

Section in 1.1 Section in 2.0 Pages in 2.0

2.0, Changes in Unicode 1.0 D.2, Changes from Unicode 1.0 to 1.1 D-1 to D-4

3.1, Double Non-Spacing Marks Diacritics Positioned Over Two Base Characters 6-14

3.2, Zero-Width Joining The Non-Joiner and Joiner 6-70 to 6-72

3.3, Byte Order Mark Byte Order Mark, Zero Width No-Break Space 6-131

3.4, Additional Alternate Format Characters Alternate Format Characters 6-72 to 6-74

4.1, Fraction Slash Fraction Slash 6-70

4.2, Dotful I Diacritics on i 6-7

4.3, Surrounding Non-Spacing Marks Figure 3-1, Enclosing Marks 3-9

4.4, Unicode Character Equivalence Canonical Ordering 3-10 to 3-11

5.0, Conjoining Korean Jamos 3.10, Combining Jamo Behavior 3-11 to 3-12

B, Han Compatibility Mappings CJK Compatibility Ideographs 6-123

C.2, Collation [of Korean Jamos] Collation [of Hangul Jamo] 6-62

D, Canonical Ordering Priorities Table 4-3, Combining Classes 4-3 to 4-10

E, Block Names Blocks-1.txt in the UCD

F, FSS-UTF A.2, UTF-8 [as adjusted for Amd 1] A-7 to A-11

G, Symmetric Swapping Characters Table 4-7, Mirrored Characters 4-22 to 4-25

H, New Characters D.2, Changes from Unicode 1.0 to 1.1 D-4 to D-6

Section in 1.1	Section in 2.0	Pages in 2.0
2.0, Changes in Unicode 1.0	D.2, Changes from Unicode 1.0 to 1.1	D-1 to D-4
3.1, Double Non-Spacing Marks	Diacritics Positioned Over Two Base Characters	6-14
3.2, Zero-Width Joining	The Non-Joiner and Joiner	6-70 to 6-72
3.3, Byte Order Mark	Byte Order Mark, Zero Width No-Break Space	6-131
3.4, Additional Alternate Format Characters	Alternate Format Characters	6-72 to 6-74
4.1, Fraction Slash	Fraction Slash	6-70
4.2, Dotful I	Diacritics on i	6-7
4.3, Surrounding Non-Spacing Marks	Figure 3-1, Enclosing Marks	3-9
4.4, Unicode Character Equivalence	Canonical Ordering	3-10 to 3-11
5.0, Conjoining Korean Jamos	3.10, Combining Jamo Behavior	3-11 to 3-12
B, Han Compatibility Mappings	CJK Compatibility Ideographs	6-123
C.2, Collation [of Korean Jamos]	Collation [of Hangul Jamo]	6-62
D, Canonical Ordering Priorities	Table 4-3, Combining Classes	4-3 to 4-10
E, Block Names	Blocks-1.txt in the UCD
F, FSS-UTF	A.2, UTF-8 [as adjusted for Amd 1]	A-7 to A-11
G, Symmetric Swapping Characters	Table 4-7, Mirrored Characters	4-22 to 4-25
H, New Characters	D.2, Changes from Unicode 1.0 to 1.1	D-4 to D-6