[Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 (original) (raw)

[Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py ))

Guido van Rossum guido at python.org
Mon May 7 19:42:41 CEST 2007


[+python-3000; replies please remove python-dev]

On 5/5/07, Josiah Carlson <jcarlson at uci.edu> wrote:

"Fred L. Drake, Jr." <fdrake at acm.org> wrote: > > On Saturday 05 May 2007, Aahz wrote: > > I'm with MAL and Fred on making literals immutable -- that's safe and > > lots of newbies will need to use byte literals early in their Python > > experience if they pick up Python to operate on network data. > > Yes; there are lots of places where bytes literals will be used the way str > literals are today. buffer(b'...') might be good enough, but it seems more > than a little idiomatic, and doesn't seem particularly readable. > > I'm not suggesting that /all/ literals result in constants, but bytes literals > seem like a case where what's wanted is the value. If b'...' results in a > new object on every reference, that's a lot of overhead for a network > protocol implementation, where the data is just going to be written to a > socket or concatenated with other data. An immutable bytes type would be > very useful as a dictionary key as well, and more space-efficient than > tuple(b'...'). I was saying the exact same thing last summer. See my discussion with Martin about parsing/unmarshaling. What I expect will happen with bytes as dictionary keys is that people will end up subclassing dictionaries (with varying amounts of success and correctness) to do something like the following... class bytesKeys(dict): ... def setitem(self, key, value): if isinstance(key, bytes): key = key.decode('latin-1') else: raise KeyError("only bytes can be used as keys") dict.setitem(self, key, value) ... Is it optimal? No. Would it be nice to have immtable bytes? Yes. Do I think it will really be a problem in parsing/unmarshaling? I don't know, but the fact that there now exists a reasonable literal syntax b'...' rather than the previous bytes([1, 2, 3, ...]) means that we are coming much closer to having what really is about the best way to handle this; Python 2.x str.

I don't know how this will work out yet. I'm not convinced that having both mutable and immutable bytes is the right thing to do; but I'm also not convinced of the opposite. I am slowly working on the string/unicode unification, and so far, unfortunately, it is quite daunting to get rid of 8-bit strings even at the Python level let alone at the C level.

I suggest that the following exercise, to be carried out in the py3k-struni branch, might be helpful: (1) change the socket module to return bytes instead of strings (it already takes bytes, by virtue of the buffer protocol); (2) change its makefile() method so that it uses the new io.py library, in particular the SocketIO wrapper there; (3) fix up the httplib module and perhaps other similar ones. Take copious notes while doing this. Anyone up for this? I will listen! (I'd do it myself but I don't know where I'd find the time).

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list