[Python-3000] Immutable bytes -- looking for volunteer (original) (raw)

Jim Jewett jimjjewett at gmail.com
Wed Sep 26 00:14:19 CEST 2007


How about we take the existing PyString implementation (Python 2's str, currently still present as str8 in py3k), remove the locale and unicode mixing support, and call it bytes.

Is that just encode/decode? But isn't this one sensible way to store an encoded str, so that decode (only) would still make sense?

I would have expected to drop text or character-oriented methods, because they should really be done on the (decoded) unicode version. Given bytes use in wire protocols, I could also understand saying that these methods only work on ASCII, and either raise an exception or return false for other byte values.

text-or-chararacter-oriented methods:

'capitalize', 'center', 'endswith', 'expandtabs', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'ljust', 'lower', 'lstrip', 'rjust', 'rstrip', 'splitlines', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'

It would mean more fixes beyond what Jeffrey and Adam did, since iterating over a bytes instance would return a bytes instance of length 1 instead of a small int,

makes sense

and the bytes constructor would change accordingly (no more initializing a bytes object from a list of ints).

Why not?

I expect the literal b"ASCII string" to be the most common constructor, but I don't see the problem with a sequence of ints (or hex) as an alternative constructor.

The (new) buffer object would also have to change to be more compatible with the (new) bytes object -- bytes<-->buffer conversions should be 1-1, and iterating over a buffer instance would also have to return a length-1 buffer (or bytes???) instance.

I would return a bytes instance. If you return a 1-char buffer, and someone does modify that, it isn't clear whether the change should be reflected in the original source buffer. If someone does want an in-place filter, they can always use enumerate and slicing.

Can we assume that the two types are unequal, but that you can search a buffer for a (constant) bytes?

>>> mybytes = b"some data"
>>> mybuffer = buffer(mybytes)

>>> mybuffer == mybytes
False

>>> mybuffer.startswith(mybytes)  and \
...    mybuffer.endswith(mybytes)  and \
...    len(mybuffer) == len(mybytes)
True

-jJ



More information about the Python-3000 mailing list