[Python-Dev] bytes type discussion (original) (raw)
Bob Ippolito bob at redivi.com
Wed Feb 15 00:35:14 CET 2006
- Previous message: [Python-Dev] bytes type discussion
- Next message: [Python-Dev] bytes type discussion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
I'm about to send 6 or 8 replies to various salient messages in the PEP 332 revival thread. That's probably a sign that there's still a lot to be sorted out. In the mean time, to save you reading through all those responses, here's a summary of where I believe I stand. Let's continue the discussion in this new thread unless there are specific hairs to be split in the other thread that aren't addressed below or by later posts.
Non-controversial (or almost): - we need a new PEP; PEP 332 won't cut it - no b"..." literal - bytes objects are mutable - bytes objects are composed of ints in range(256) - you can pass any iterable of ints to the bytes constructor, as long as they are in range(256)
Sounds like array.array('B').
Will the bytes object support the buffer interface? Will it accept
objects supporting the buffer interface in the constructor (or a
class method)? If so, will it be a copy or a view? Current
array.array behavior says copy.
- longs or anything with an index method should do, too
- when you index a bytes object, you get a plain int
When slicing a bytes object, do you get another bytes object or a
list? If its a bytes object, is it a copy or a view? Current
array.array behavior says copy.
- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
Somewhat controversial: - it's probably too big to attempt to rush this into 2.5 - bytes("abc") == bytes(map(ord, "abc")) - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256])
It would be VERY controversial if ord('\xff') == 256 ;)
Very controversial:
- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument - bytes(u"abc") == bytes("abc") # for ASCII at least - bytes(u"\x80\xff") raises UnicodeError - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff") Martin von Loewis's alternative for the "very controversial" set is to disallow an encoding argument and (I believe) also to disallow Unicode arguments. In 3.0 this would leave us with s.encode() as the only way to convert a string (which is always unicode) to bytes. The problem with this is that there's no code that works in both 2.x and 3.0.
Given a base64 or hex string, how do you get a bytes object out of
it? Currently str.decode('base64') and str.decode('hex') are good
solutions to this... but you get a str object back.
-bob
- Previous message: [Python-Dev] bytes type discussion
- Next message: [Python-Dev] bytes type discussion
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]