[Python-3000] Support for PEP 3131 (original) (raw)

Jim Jewett jimjjewett at gmail.com
Tue May 22 01:29:36 CEST 2007


On 5/21/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:

> Would it be acceptable to create an encoding such that you could read > and write

> Löwis

> in your editor, but upon import, python treated it as though you had > writtten

> LU246wis

> Other modules would see LU246wis, unless they also used that encoding > -- in which case the user should also see Löwis while editing.

What problem would that solve? You could type the identifier that way - but you would need to know already that this is the identifier you want to type; how do you know?

(1) If I am using the module based on its documentation, or based on opening it up and reading it, then I can use the same encoding, and I can write Löwis.

(2) If I do arbitrary introspection, such as

import sys
for k, v in sys.modules:
    if v:
        print dir(v)

then I will get something usable, though perhaps not easily readable.

(3) The mapping is reversible, so I can work interactively with the arbitrary characters by setting my console/idle preferences to the special encoding.

Again, I'm uncertain what the use case here would be. For "proper" transliteration, users can memorize easily what the transliterated name would be, and visually identify the two representations.

For two latin-based alphabets, yes. I'm not so sure for non-western scripts.

As you pointed out, the correct transliteration may depend on the natural language (instead of just the character code point), which means we probably can't do it automatically.

It also has to be a one-way transliteration; if ö -> o (or oe) then an o (or oe) in the result can't always be transliterated back.

With a "numeric transliteration", users would not normally be able to tell what a transliterated character means, or how to transliterate a given character.

(1) They shouldn't ever need to see the numeric version unless they're intentionally peeking under the covers, or their site doesn't have the appropriate encoding installed. One advantage of this method is that a single transliteration method could work for any language, so it probably would be installed already.

(2) Even if users did somehow see the numeric version, it wouldn't be that awful. For the langauges close enough to ASCII that a transliteration is straightforward, the number of extra characters to memorize is fairly small.

> If the above is not acceptable, and even the internal representation > has to be readable, then would it be acceptable to make the > transliteration strategy something the user could set, similar to > today's coding: directive?

Then I don't understand your above proposal. I thought you were proposing to replace all non-ASCII characters with some ASCII form on import of the module. What do you mean by "readable internal representation"?

This alternative would let an individual user say "I'm writing Swedish; turn my ö into an o." The actual identifiers used by Python itself would be more readable, but the downside is that users would have to read them more often, instead of using/editing/viewing strictly in the untransliterated version.

-jJ



More information about the Python-3000 mailing list