[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3 (original) (raw)

Daniel Holth dholth at gmail.com
Thu Mar 27 20🔞41 CET 2014


On Thu, Mar 27, 2014 at 2:53 PM, Guido van Rossum <guido at python.org> wrote:

So what's the use case for Python 2/3 compatible code? IMO the main use case for the PEP is simply to be able to construct bytes from a combination of a template and some input that may include further bytes and numbers. E.g. in asyncio when you write an HTTP client or server you have to construct bytes to write to the socket, and I'd be happy if I could write b'HTTP/1.0 %d %b\r\n' % (status, message) rather than having to use str(status).encode('ascii') and concatenation or join().

It seems to be notoriously difficult to understand or explain why Unicode can still be very hard in Python 3 or in code that is in the middle of being ported or has to run in both interpreters. As far as I can tell part of it is when a symbol has type(str or bytes) depending (declared as if we had a static type system with union types); some of it is because incorrect mixing can happen without an exception, only to be discovered later and far away in space and time from the error (worse of all in a serialized file), and part of it is all of the not easily checkable "types" a particular Unicode object has depending on whether it contains surrogates or codes > n. Sometimes you might simply disagree about whether an API should be returning bytes or Unicode in mildly ambiguous cases like base64 encoding. Sometimes Unicode is just intrinsically complicated.

For me this PEP holds the promise of being able to do work in the bytes domain, with no accidental mixing ever, when I really want bytes. For 2+3 I would get exceptions sometimes in Python 2 and exceptions all the time in Python 3 for mistakes. I hope this is less error prone in strict domains than for example u"string processing".encode('latin1'). And I hope that there is very little type(str or int) in HTTP for example or other "legitimate" bytes domains but I don't know; I suspect that if you have a lot of problems with bytes' %s then it's a clue you should use (u"%s" % (argument)).encode() instead.

sprintf()'s version of %s just takes a char* and puts it in without doing any type conversion of course. IANACL (I am not a C lawyer).



More information about the Python-Dev mailing list