[I18n-sig] Re: [Python-Dev] Unicode debate (original) (raw)
Guido van Rossum guido@python.org
Tue, 02 May 2000 08:26:50 -0400
- Previous message: [I18n-sig] Re: [Python-Dev] Unicode debate
- Next message: [I18n-sig] Re: [Python-Dev] Unicode debate
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[MAL]
Let's not do the same mistake again: Unicode objects should not be used to hold binary data. Please use buffers instead.
Easier said than done -- Python doesn't really have a buffer data type. Or do you mean the array module? It's not trivial to read a file into an array (although it's possible, there are even two ways). Fact is, most of Python's standard library and built-in objects use (8-bit) strings as buffers.
I agree there's no reason to extend this to Unicode strings.
BTW, I think that this behaviour should be changed:
>>> buffer('binary') + 'data' 'binarydata' while: >>> 'data' + buffer('binary') Traceback (most recent call last): File "", line 1, in ? TypeError: illegal argument type for built-in operation IMHO, buffer objects should never coerce to strings, but instead return a buffer object holding the combined contents. The same applies to slicing buffer objects: >>> buffer('binary')[2:5] 'nar' should prefereably be buffer('nar').
Note that a buffer object doesn't hold data! It's only a pointer to data. I can't off-hand explain the asymmetry though.
--
Hmm, perhaps we need something like a data string object to get this 100% right ?! >>> d = data("...data...") or >>> d = d"...data..." >>> print type(d) <type 'data'> >>> 'string' + d d"string...data..." >>> u'string' + d d"s\000t\000r\000i\000n\000g\000...data..." >>> d[:5] d"...da" etc. Ideally, string and Unicode objects would then be subclasses of this type in Py3K.
Not clear. I'd rather do the equivalent of byte arrays in Java, for which no "string literal" notations exist.
--Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [I18n-sig] Re: [Python-Dev] Unicode debate
- Next message: [I18n-sig] Re: [Python-Dev] Unicode debate
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]