[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Adam Olsen rhamph at gmail.com
Tue Feb 14 08:04:32 CET 2006


On 2/13/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:

M.-A. Lemburg wrote: > We're talking about Py3k here: "abc" will be a Unicode string, > so why restrict the conversion to 7 bits when you can have 8 bits > without any conversion problems ?

YAGNI. If you have a need for byte string in source code, it will typically be "random" bytes, which can be nicely used through bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14, 0x8b, 0xee]) For larger blocks, people should use base64.stringtobytes (which can become a synonym for base64.decodestring in Py3k). If you have bytes that are meaningful text for some application (say, a wire protocol), it is typically ASCII-Text. No protocol I know of uses non-ASCII characters for protocol information.

What would that imply for repr()? To support eval(repr(x)) it would have to produce whatever format the source code includes to begin with.

If I understand correctly there's three main candidates:

  1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
  2. Direct copying to str/unicode if it's only ascii values, switching to a list of hex literals if there's any non-ascii values
  3. b"foo" literal with ascii for all ascii characters (other than
    and "), \xFF for individual characters that aren't ascii

Given the choice I prefer the third option, with the second option as my runner up. The first option just screams "silent errors" to me.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-Dev mailing list