Origin of the U+nnnn notation (original) (raw)
Next message: Hans Aberg: "Re: �land"
- Previous message: Antoine Leca: "Re: Origin of the U+nnnn notation"
- Maybe in reply to: Jukka K. Korpela: "Origin of the U+nnnn notation"
- Next in thread: Hans Aberg: "Re: Origin of the U+nnnn notation"
- Reply: Hans Aberg: "Re: Origin of the U+nnnn notation"
- Reply: Johannes Bergerhausen: "Re: Origin of the U+nnnn notation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
Antoine Leca noted:
> I also remember asking about the introduction of the U+xxxxx and U+10xxxx
> notation, perhaps in year 2000, and to be so confirmed by Dr. Whistler;
> unfortunately my file archives are pretty bad, and I cannot found the post
> right now (well, the interessant one here is Ken's answer, not mine); I did
> not even remember if it was on this list, silly me.
This might be recalling a rather long thread from May 2000,
regarding \uxxxx notation, in which Antoine participated
and Markus Scherer concluded:
> We have a winner: the new (draft) C _and_ C++ standards are introducing
> \uhhhh (fixed-length, 4 hex digits) and
> \Uhhhhhhhh (fixed-length, 8 hex digits)
>
> while Perl and Kermit are using
> \x{hh...h} (variable-length, hex digits, I guess 1..8 of them)
But I don't spot anything in that thread about the history of the U+xxxx
notation per se.
The use of the U+xxxx notation in publications goes back to
Unicode 1.0 (1991), where it was explicitly used, and explained on
p. xv:
"An individual Unicode value is expressed as U+nnnn, where
nnnn is a four digit number in hexadecimal notation, ..."
The usage appears in draft documents from late 1989,
so the convention itself dates back to then.
The introduction of the short identifiers in ISO/IEC 10646 was
in part an attempt to grandfather this usage into 10646 and make it
recognized and valid for the Unicode Standard as an implementation
of 10646. The initial edition of 10646-1:1993 did not have them,
and simply used 4-digit hex or 8-digit hex for UCS-2 or
UCS-4, respectively.
10646-1:2000 (the 'second edition') added short identifiers
in clause 6.5, defined as:
"The full syntax of the notation of a short identifier, in
Backus-Naur form, is:
{U | u}[{+}xxxx | {-}xxxxxxxx] "
The formal source for that was Amd 9 to 10646-1:1993. And
the history of that amendment is that it was initiated in response to
a liaison report from SC22 to SC2, dated September 22, 1995,
requesting that 10646 add short unique identifiers, for
use by other standards. The PDAM 9 was issued in April 1996,
and Amd 9 was actually published in 1997.
The specification has since been modified to:
"The full syntax of the notation of a short identifier, in
Backus-Naur form, is:
{U | u}[{+}(xxxx | xxxxx | xxxxxx) | {-}xxxxxxxx] "
This modification was to account for practice
that uses 5- and 6-digit forms for the supplementary characters
(U+10000..U+10FFFF).
What is little-known generally is that the "U+" convention itself
was an ASCII-fied compromise for what the Unicode designers
*really* wanted to use for the Unicode hexadecimal prefix,
which was U+228E MULTISET UNION (whose glyph is a union sign
with a plus sign in it). That symbol can actually be spotted in
some of the early Unicode collateral (T-shirts, stationery,
business cards, etc.), because it was used as part of the original
Unicode logo design, before the switch to the now ubiquitous
Uni design that has been used for more than a decade.
The semantic appropriateness of MULTISET UNION as a designator
for Unicode code points ought to be apparent, and the shape of
the union symbol itself was iconic for the "U" of Unicode. But
use of the symbol in data files and documentation in the
early days was problematical, of course, and it soon gave way
to the much more practical use of "U+" instead.
--Ken
P.S. This tale is part of the story to be written for U+228E.
- Next message: Hans Aberg: "Re: �land"
- Previous message: Antoine Leca: "Re: Origin of the U+nnnn notation"
- Maybe in reply to: Jukka K. Korpela: "Origin of the U+nnnn notation"
- Next in thread: Hans Aberg: "Re: Origin of the U+nnnn notation"
- Reply: Hans Aberg: "Re: Origin of the U+nnnn notation"
- Reply: Johannes Bergerhausen: "Re: Origin of the U+nnnn notation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
This archive was generated by hypermail 2.1.5: Tue Nov 08 2005 - 14:42:58 CST