[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Paul Moore p.f.moore at gmail.com
Sun Jan 12 23:29:14 CET 2014


On 12 January 2014 22:10, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

I think the readability argument becomes a bit sharper when you consider more complex examples, e.g. if I have a tuple of 3 floats that I want to put into a PDF file, then

b"%f %f %f" % myfloats is considerably clearer than b" ".join((floattobytes(f) for f in myfloats))

Hmm, I'm not sure I'd agree. I'd quote "explicit is better than implicit", but given comments below, that would be a mistake :-) Let's just leave it that I'd probably wrap the whole thing in a float_list(floats) function in my application, and not care how it was implemented.

One thing that this does bring up, though, is that all the talk is about %-formatting. Do the people who are arguing for numeric formatting have views on what (if any) features will be included in bytes.format()? It seems to me that recasting many of the discussions using format() make it much less "obvious" that adding the features to bytes formatting is a reasonable thing to do. I won't give specific examples, because I would be putting words into people's mouths. But I would say that any genuine proposal for numeric formatting in bytes should be cast as a formal PEP and explicitly document both % and format() behaviours.

It's indicated (I won't say "implied", see below) by the fact that we're interpolating it into a bytes object rather than a string.

This is no more or less implicit than the fact that when we write b"ABC" then we're saying that those characters are to be encoded in ASCII, and not EBCDIC or UTF-16 or...

That's a fair point, and one I had not taken into consideration.

BTW, there's a problem with bandying around the words "implicit" and "explicit", because they depend on your frame of reference. For example, one person might say that the fact that b"%s" encodes into ASCII is implicit, because ASCII isn't written down in the code anywhere. But another person might say it's explicit, because the manual explicitly says that stuff interpolated into a bytes object is encoded as ASCII.

In my defense, I would say that I was trying to clarify Nick's objections, and it's entirely possible I misrepresented this aspect of them.

Personally, I agree that it's not as black and white as simply saying "numeric formatting is wrong", but I think that the fact that %d et al represent a "double transformation" (from number to string representation to encoded bytes) is the differentiating factor here. Proposals that do nothing but interpolation are essentially convenience wrappers for various combinations of concatenation and join. Adding "double transformation" formatting codes is a step change, and needs to be explicitly acknowledged and justified. (If you do manage to justify such codes, there's a secondary question of precisely what codes should be supported, but we can start by getting agreement that the class of codes is allowed). PEP 460 explicitly excludes anything but pure interpolation.

So arguments of the form "X is bad because it's not explicit" are prone to getting people talking past each other.

Fair point. I hope my above paragraph clarifies my position somewhat better.

Paul



More information about the Python-Dev mailing list