[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Ethan Furman ethan at stoneleaf.us
Sun Jan 12 19:26:56 CET 2014


On 01/12/2014 09:26 AM, Paul Moore wrote:

On 12 January 2014 17:03, Ethan Furman <ethan at stoneleaf.us> wrote:

We know full well the difference between unicode and bytes, and we know full well that numbers and much of the text we need has an ASCII (bytes!) representation. When we do a b'Content Length: %d' % len(binarydata) we are expecting to get back a bytes object, /not/ a unicode object. What I am struggling to understand here is what room for compromise there is. Clearly, for whatever reason, b'Content Length: ' + str(len(binarydata)).encode('ascii')) is not acceptable for you. OK, fair enough. Also, apparently, writing a helper def inttobytes(n): return str(n).encode('ascii') b'Content Length: ' + inttobytes(len(binarydata)) is unacceptable. But I'm not clear why it's unacceptable. Maybe I missed the explanation - God knows, the thread is long enough :-)

True enough! ;) It's unacceptable in the sense that the bytes type is /almost/ there, it's /almost/ what is needed to handle the boundary conditions. We have a bytes method (how is it supposed to be used?) that could be made to fit the interpolation bill.

It seems to me the core of Nick's refusal is the (and I agree!) rejection of bytes interpolation returning unicode -- but that's not what I'm asking for! I'm asking for it to return bytes, with the interpolated data (in the case if %d, %s, etc) being strictly-ASCII encoded.

On the other hand, Nick has explained why b'Content Length: %d' % len(binarydata) is unacceptable to him (you don't have to agree with his opinion, just concede that he has explained his position in a way that you understand).

Only because he (or Benno) finally wrote some tests and I was able to see what he thought I was wanting. Which does seem to leave a tiny bit of wiggle room if bytes interpolation always return bytes, and never a unicode (yeah, I know, snowball's chance and all that).

I'm not trying to argue you're wrong - I don't know your codebase, nor do I know your application area. But surely somewhere between "we must have % formatting including %d for bytes" and the above, there's a middle ground that you are willing to accept? Can you give any indications of what that might be? What, specifically, about the helper function is the problem? I don't think it is any less space efficient, it doesn't double-encode, and I don't think it's more difficult to understand (although it is a little longer, it trades that off against being a bit more explicit as to what's going on). Surely you're not arguing that your code must work unchanged (not "there's a way of writing the code so it works on Python 2 and 3", but "the code you currently have for Python 2 must work with no changes at all")?

I'm arguing from three PoVs:

  1. 2 & 3 compatible code base

  2. having the bytes type /be/ the boundary type

  3. readable code

Can you give an example of code that is nearly acceptable to you, which works in Python 2 and 3 today, and explain what improvements you would like to see to it in order to use it instead of waiting for a core change?

I'm not trying to be difficult (just naturally good at it, I guess ;) , but I don't see a lot room for compromises -- I would like % interpolation, I'm told I have to use a helper function. I will if I have to, but first I have to try and make myself understood, and I'm not sure that has happened yet. Following Nick's example I'm writing up some tests that clearly show what I would like to see. Then at least we can debate what I'm actually asking for, and now what the (understandably) unicode-what-a-mess-we-had-in-py2k-don't-want-again that some think I am asking for.

-- Ethan



More information about the Python-Dev mailing list