[Python-Dev] Unicode (original) (raw)

Jack Jansen Jack.Jansen@oratrix.com
Mon, 29 Apr 2002 00:05:13 +0200


On vrijdag, april 26, 2002, at 06:26 , Guido van Rossum wrote:

No syntactic changes, no. But the way we do things would become significantly different. And think of binary I/O vs. textual I/O -- currently, file.read() returns a string. Code dealing with binary files will look significantly different, and old code won't work.

It could be argued that open(..., 'r').read() returns a text string and open(..., 'rb').read() returns a binary blob.

If textstrings and blobs become wholly different objects this shouldn't create too many problems [see below], except for code that opens a file in binary mode and (partially) reads the resulting file expecting text. But this code would need revisiting anyway if the normal textstring would become unicode.

[here's below] To my surprise I think that having blobs and textstrings be unrelated objects creates less problems than having the one be a subtype of the other. At least, every time I try to do the subtyping in my head I flip back and forth between textstrings-are-a-subtype-of-general-binary-buffers and binary-buffers-are-a-special-case-of-python-strings every couple of seconds. I think having them both be subtypes of a common base type (basestring) might work, but I'm not sure.