[Python-Dev] bytes type discussion (original) (raw)
Bob Ippolito bob at redivi.com
Wed Feb 15 01:56:00 CET 2006
- Previous message: [Python-Dev] bytes type discussion
- Next message: [Python-Dev] bytes type discussion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:
On 2/14/06, Bob Ippolito <bob at redivi.com> wrote:
On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
- we need a new PEP; PEP 332 won't cut it
- no b"..." literal - bytes objects are mutable - bytes objects are composed of ints in range(256) - you can pass any iterable of ints to the bytes constructor, as long as they are in range(256) Sounds like array.array('B'). Sure. Will the bytes object support the buffer interface? Do you want them to? I suppose they should not support the text part of that API.
I would imagine that it'd be convenient for integrating with existing
extensions... e.g. initializing an array or Numeric array with one.
Will it accept objects supporting the buffer interface in the constructor (or a class method)? If so, will it be a copy or a view? Current array.array behavior says copy. bytes() should always copy -- thanks for asking.
I only really ask because it's worth fully specifying these things.
Copy seems a lot more sensible given the rest of the interpreter and
stdlib (e.g. buffer(x) seems to always return a read-only buffer).
- longs or anything with an index method should do, too
- when you index a bytes object, you get a plain int When slicing a bytes object, do you get another bytes object or a list? If its a bytes object, is it a copy or a view? Current array.array behavior says copy. Another bytes object which is a copy. (Why would you even think about views here? They are evil.)
I mention views because that's what numpy/Numeric/numarray/etc.
do... It's certainly convenient at times to have that functionality,
for example, to work with only the alpha channel in an RGBA image.
Probably too magical for the bytes type.
import numpy image = numpy.array(list('RGBARGBARGBA')) alpha = image[3::4] alpha array([A, A, A], dtype=(string,1)) alpha[:] = 'X' image array([R, G, B, X, R, G, B, X, R, G, B, X], dtype=(string,1))
Very controversial:
- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument - bytes(u"abc") == bytes("abc") # for ASCII at least - bytes(u"\x80\xff") raises UnicodeError - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff") Martin von Loewis's alternative for the "very controversial" set is to disallow an encoding argument and (I believe) also to disallow Unicode arguments. In 3.0 this would leave us with s.encode() as the only way to convert a string (which is always unicode) to bytes. The problem with this is that there's no code that works in both 2.x and 3.0. Given a base64 or hex string, how do you get a bytes object out of it? Currently str.decode('base64') and str.decode('hex') are good solutions to this... but you get a str object back. I don't know -- you can propose an API you like here. base64 is as likely to encode text as binary data, so I don't think it's wrong for those things to return strings.
That's kinda true I guess -- but you'd still need an encoding in py3k
to turn base64 -> text. A lot of the current codecs infrastructure
doesn't make sense in py3k -- for example, the 'zlib' encoding, which
is really a bytes transform, or 'unicode_escape' which is a text
transform.
I suppose there aren't too many different ways you'd want to encode
or decode data to binary (beyond the text codecs), they should
probably just live in a module -- something like the binascii we have
now. I do find the codecs infrastructure to be convenient at times
(maybe too convenient), but since you're not interested in adding
functions to existing types then a module seems like the best approach.
-bob
- Previous message: [Python-Dev] bytes type discussion
- Next message: [Python-Dev] bytes type discussion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]