[Python-Dev] Unicode debate (original) (raw)

Ka-Ping Yee ping@lfw.org
Wed, 3 May 2000 01:50:30 -0700 (PDT)


On Wed, 3 May 2000, Fredrik Lundh wrote:

Guido van Rossum <guido@python.org> wrote: > But there must be a way to turn on Unicode-awareness on e.g. stdout > and then printing a Unicode object should not use str() (as it > currently does).

to throw some extra gasoline on this, how about allowing str() to return unicode strings?

You still need to print them somehow. One way or another, stdout is still just a stream with bytes on it, unless we augment file objects to understand encodings.

stdout sends bytes to something -- and that something will interpret the stream of bytes in some encoding (could be Latin-1, UTF-8, ISO-2022-JP, whatever). So either:

1.  You explicitly downconvert to bytes, and specify
    the encoding each time you do.  Then write the
    bytes to stdout (or your file object).

2.  The file object is smart and can be told what
    encoding to use, and Unicode strings written to
    the file are automatically converted to bytes.

Another thread mentioned having separate read/write and binary_read/binary_write methods on files. I suggest doing it the other way, actually: since read/write operate on byte streams now, they are the binary operations; the new methods should be the ones that do the extra encoding/decoding work, and could be called uniread/uniwrite, uread/uwrite, textread/textwrite, etc.

(extra questions: how about renaming "unicode" to "string", and getting rid of "unichr"?)

Would you expect chr(x) to return an 8-bit string when x < 128, and a Unicode string when x >= 128?

-- ?!ng