[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

Guido van Rossum guido at python.org
Thu Sep 1 17:45:14 CEST 2011


On Thu, Sep 1, 2011 at 12:13 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Where I cut your words, we are in 100% agreement.  (FWIW :-)

Not quite the same here, but I don't feel the need to have the last word. Most of what you say makes sense, in some cases we'll quibble later, but there are a few points where I have something to add:

No, and I can tell you why!  The difference between characters and words is much more important than that between code point and grapheme cluster for most users and the developers who serve them.  Even small children recognize typographical ligatures as being composite objects,

True -- in fact I didn't know that ff and ffl ligatures existed until I learned about Unix troff.

while at least this Spanish-as-a-second-language learner was taught that ñ' is an atomic character represented by a discontiguous glyph,_ _like i', and it is no more related to n' than m' is.  Users really believe that characters are atomic.  Even in the cases of Han characters and Hangul, users think of the characters as being "atomic," but in the sense of Bohr rather than that of Democritus.

Ah, I think this may very well be culture-dependent. In Holland there are no Dutch words that use accented letters, but the accents are known because there are a lot of words borrowed from French or German. We (the Dutch) think of these as letters with accents and in fact we think of the accents as modifiers that can be added to any letter (at least I know that's how I thought about it -- perhaps I was also influenced by the way one had to type those on a mechanical typewriter). Dutch does have one native use of the umlaut (though it has a different name, I forget which, maybe trema :-), when there are two consecutive vowels that would normally be read as a special sound (diphthong?). E.g. in "koe" (cow) the oe is two letters (not a single letter formed of two distict shapes!) that mean a special sound (roughly KOO). But in a word like "coëxistentie" (coexistence) the o and e do not form the oe-sound, and to emphasize this to Dutch readers (who believe their spelling is very logical :-), the official spelling puts the umlaut on the e. This is definitely thought of as a separate mark added to the e; ë is not a new letter. I have a feeling it's the same way for the French and Germans, but I really don't know. (Antoine? Georg?)

Finally, my guess is that the Spanish emphasis on ñ as a separate letter has to do with teaching how it has a separate position in the localized collation sequence, doesn't it? I'm also curious if ñ occurs as a separate character on Spanish keyboards.

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list