[Python-3000] string module trimming (original) (raw)

Jim Jewett jimjjewett at gmail.com
Sat Apr 28 03:06:06 CEST 2007

Previous message: [Python-3000] string module trimming
Next message: [Python-3000] [Python-Dev] python3k change to slicing
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 4/27/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:

On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:

> Agreed. But there aren't 40K (alphabetic) letters in any particular > locale. Most individual languages will have less than 100.

Here's a relevant bunch of data from the CLDR: http://www.unicode.org/cldr/data/charts/bytype/misc.exemplarCharacters.html

http://www.unicode.org/Public/UNIDATA/Scripts.txt is also relevant, but I can't quite interpret it.

There are 5020 "Common" code points. These are mostly non-letters, but I suppose they could appear in some langauges.

Latin script has 1070 characters; most Latin-script languages use only a small fraction of them. The standard ASCII alphabet is still only 26 lower + 26 capital, but there are plenty of characters that get used in some language or other. (The largest single block is 208 letters from LATIN CAPITAL LETTER DZ WITH CARON to LATIN SMALL LETTER EZH WITH CURL)

-jJ

Previous message: [Python-3000] string module trimming
Next message: [Python-3000] [Python-Dev] python3k change to slicing
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list