[Python-Dev] Re: adding a bytes sequence type to Python (original) (raw)

James Y Knight foom at fuhm.net
Wed Aug 18 00:04:45 CEST 2004


On Aug 17, 2004, at 5:18 PM, Bob Ippolito wrote:

On Aug 17, 2004, at 5:11 PM, Martin v. Löwis wrote:

Bob Ippolito wrote: How would you embed raw bytes if the string was unicode?

The most direct notation would be bytes("delimited packet\x00") However, people might not understand what is happening, and Guido doesn't like it if the bytes are >127. I guess that was a bad example, what if the delimiter was \xff?

Indeed, if all strings are unicode, the question becomes: what encoding does bytes() use to translate unicode characters to bytes. Two alternatives have been proposed so far:

  1. ASCII (translate chars as their codepoint if < 128, else error)
  2. ISO-8859-1 (translate chars as their codepoint if < 256, else error)

I think I'd choose #2, myself.

I know that map(ord, u'delimited packet\xff') would get correct results.. but I don't think I like that either.

Why would you consider that wrong? ord(u'\xff') should return 255. Just as ord(u'\u1000') returns 4096. There's nothing mysterious there.

James



More information about the Python-Dev mailing list