Unicode Mail List Archive: RE: How many characters? (original) (raw)
Next message: Jony Rosenne: "RE: Hebrew script in IDN"
- Previous message: Peter Constable: "RE: ISO 15924: Different Arabic scripts?"
- Maybe in reply to: Otto Stolz: "How many characters?"
- Next in thread: Kenneth Whistler: "Re: How many characters?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
[I replied earlier, but that response seems to have gotten lost.]
I think both you and Ken are wrong re 4.1. For the BMP, I did a hand count of Cf characters, and came up with 33, not 31 or 35. I also did counts on various categories of graphic characters and got the following:
Alphabetics, Symbols: 12,497
Han (URO): 20,927
Han Extension A: 6,582
Han Compatibility: 467
Hangul Syllables: 11,172
Total Graphic characters: 51,642
Thus, I get the following for 4.1:
Unicode 4.1:
51640 graphic characters assigned (BMP)
35 format control characters assigned (BMP)
65 control characters assigned (BMP)
6400 private use characters assigned (BMP)
2048 surrogate code points designated (BMP)
34 noncharacter code points designated (BMP)
5314 reserved code points (BMP)
45875 graphic characters assigned (supplementary planes)
105 format characters assigned (supplementary planes)
131068 private use characters assigned (supplementary planes)
32 noncharacter code points designated (supplementary planes)
871496 reserved code points (supplementary planes)
------------------------------------------------------------------
1114112 code points altogether
Peter Constable
> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
> Behalf Of Andrew West
> Sent: Wednesday, November 23, 2005 4:26 AM
> To: unicode@unicode.org
> Subject: Re: How many characters?
>
> On 22/11/05, Kenneth Whistler <kenw@sybase.com> wrote:
> >
> > Unicode 4.1:
> >
> > 51644 graphic characters assigned (BMP)
> > 31 format control characters assigned (BMP)
> > 65 control characters assigned (BMP)
> > 6400 private use characters assigned (BMP)
> > 2048 surrogate code points designated (BMP)
> > 34 noncharacter code points designated (BMP)
> > 5314 reserved code points (BMP)
> > 45980 graphic characters assigned (supplementary planes)
> > 131068 private use characters assigned (supplementary planes)
> > 32 noncharacter code points designated (supplementary planes)
> > 871496 reserved code points (supplementary planes)
> > ------------------------------------------------------------------
> > 1114112 code points altogether
> >
> > Unicode 5.0:
> >
> > 51986 graphic characters assigned (BMP)
> > 31 format control characters assigned (BMP)
> > 65 control characters assigned (BMP)
> > 6400 private use characters assigned (BMP)
> > 2048 surrogate code points designated (BMP)
> > 34 noncharacter code points designated (BMP)
> > 4972 reserved code points (BMP)
> > 47007 graphic characters assigned (supplementary planes)
> > 131068 private use characters assigned (supplementary planes)
> > 32 noncharacter code points designated (supplementary planes)
> > 870469 reserved code points (supplementary planes)
> > ------------------------------------------------------------------
> > 1114112 code points altogether
> >
>
> Ken may perhaps have forgotten that the 4.0 figures wrongly count five
> format characters as graphic characters, and so after adjusting for
> the longstanding out by two error the 4.1 figures for format
> characters are still out by four due to the change in GC of U+200B to
> Cf in 4.0.1. By my calculations the correct values for 4.1 are:
>
> Unicode 4.1:
>
> 51640 graphic characters assigned (BMP)
> 35 format control characters assigned (BMP)
> 65 control characters assigned (BMP)
> 6400 private use characters assigned (BMP)
> 2048 surrogate code points designated (BMP)
> 34 noncharacter code points designated (BMP)
> 5314 reserved code points (BMP)
> 45875 graphic characters assigned (supplementary planes)
> 105 format characters assigned (supplementary planes)
> 131068 private use characters assigned (supplementary planes)
> 32 noncharacter code points designated (supplementary planes)
> 871496 reserved code points (supplementary planes)
> ------------------------------------------------------------------
> 1114112 code points altogether
>
> Based on the latest publicly available version of the 5.0 UCD data, I
> get the following figures for 5.0. My figures have two less BMP and
> two more SMP characters than Ken's figures, but I haven't
> cross-checked with N2991 yet (N2991 states there are 1,359 new
> characters, but this must be a typo for 1,369), so I'm not sure who's
> correct.
>
> Unicode 5.0:
>
> 51980 graphic characters assigned (BMP)
> 35 format control characters assigned (BMP)
> 65 control characters assigned (BMP)
> 6400 private use characters assigned (BMP)
> 2048 surrogate code points designated (BMP)
> 34 noncharacter code points designated (BMP)
> 4974 reserved code points (BMP)
> 46904 graphic characters assigned (supplementary planes)
> 105 format characters assigned (supplementary planes)
> 131068 private use characters assigned (supplementary planes)
> 32 noncharacter code points designated (supplementary planes)
> 870467 reserved code points (supplementary planes)
> ------------------------------------------------------------------
> 1114112 code points altogether
>
> Andrew
>
- Next message: Jony Rosenne: "RE: Hebrew script in IDN"
- Previous message: Peter Constable: "RE: ISO 15924: Different Arabic scripts?"
- Maybe in reply to: Otto Stolz: "How many characters?"
- Next in thread: Kenneth Whistler: "Re: How many characters?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
This archive was generated by hypermail 2.1.5: Wed Nov 23 2005 - 10:20:58 CST