[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Greg Ewing greg.ewing at canterbury.ac.nz
Sun Jan 12 23:10:59 CET 2014


Paul Moore wrote:

On 12 January 2014 18:26, Ethan Furman <ethan at stoneleaf.us> wrote:

I'm arguing from three PoVs: 1) 2 & 3 compatible code base 2) having the bytes type /be/ the boundary type 3) readable code The only one of these that I can see being in any way an argument against def inttobytes(n): return str(n).encode('ascii') b'Content Length: ' + inttobytes(len(binarydata)) is (3),

I think the readability argument becomes a bit sharper when you consider more complex examples, e.g. if I have a tuple of 3 floats that I want to put into a PDF file, then

b"%f %f %f" % my_floats

is considerably clearer than

b" ".join((float_to_bytes(f) for f in my_floats))

My reading of Nick's refusal is that %d takes a value which is semantically a number, converts it into a base-10 representation (which is semantically a string, not a sequence of bytes[1]) and then encodes that string into a series of bytes using the ASCII encoding. That is two semantic transformations, and one (the ASCII encoding) is implicit. Specifically, it's implicit because (a) the normal reading of %d is "produce the base-10 representation of a number, and a base-10 representation is a string, and (b) because nowhere has ASCII been mentioned

It's indicated (I won't say "implied", see below) by the fact that we're interpolating it into a bytes object rather than a string.

This is no more or less implicit than the fact that when we write

b"ABC"

then we're saying that those characters are to be encoded in ASCII, and not EBCDIC or UTF-16 or...

BTW, there's a problem with bandying around the words "implicit" and "explicit", because they depend on your frame of reference. For example, one person might say that the fact that b"%s" encodes into ASCII is implicit, because ASCII isn't written down in the code anywhere. But another person might say it's explicit, because the manual explicitly says that stuff interpolated into a bytes object is encoded as ASCII.

So arguments of the form "X is bad because it's not explicit" are prone to getting people talking past each other.

-- Greg



More information about the Python-Dev mailing list