[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Paul Moore p.f.moore at gmail.com
Sun Jan 12 20:00:32 CET 2014

Previous message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Next message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12 January 2014 18:26, Ethan Furman <ethan at stoneleaf.us> wrote:

True enough! ;) It's unacceptable in the sense that the bytes type is /almost/ there, it's /almost/ what is needed to handle the boundary conditions. We have a bytes method (how is it supposed to be used?) that could be made to fit the interpolation bill.

And yet I still don't follow what you want. Unless it's that b'%d' % (12,) must work and give b'12', and nothing else is acceptable. Maybe more accurately, I don't see what you want to do that can't be done in another way. All I'm seeing in your rejection of alternative suggestions is "it's not %-interpolation using %d".

I'm arguing from three PoVs: 1) 2 & 3 compatible code base 2) having the bytes type /be/ the boundary type 3) readable code

The only one of these that I can see being in any way an argument against

def int_to_bytes(n): return str(n).encode('ascii')

b'Content Length: ' + int_to_bytes(len(binary_data))

is (3), and that's largely subjective. Personally, I see very little difference between the above and %d-interpolation in terms of readability. Brevity, certainly %d wins. But that's not important on its own, and I'd argue that my version is more clear in terms of describing the intent (and would be even better if I wasn't rubbish at thinking of function names, or if this wasn't in isolation, and more application-focused functions were used).

It seems to me the core of Nick's refusal is the (and I agree!) rejection of bytes interpolation returning unicode -- but that's not what I'm asking for! I'm asking for it to return bytes, with the interpolated data (in the case if %d, %s, etc) being strictly-ASCII encoded.

My reading of Nick's refusal is that %d takes a value which is semantically a number, converts it into a base-10 representation (which is semantically a string, not a sequence of bytes[1]) and then encodes that string into a series of bytes using the ASCII encoding. That is two semantic transformations, and one (the ASCII encoding) is implicit. Specifically, it's implicit because (a) the normal reading of %d is "produce the base-10 representation of a number, and a base-10 representation is a string, and (b) because nowhere has ASCII been mentioned (why not UTF16? that would be entirely plausible for a wchar-based environment like Windows). And a core principle of the bytes/text separation in Python 3 is that encoding should never happen implicitly.

By the way, I should point out that I would never have understood any of the ideas involved in this thread before Python 3 forced me to think about Unicode and the distinction between text and bytes. And yet, I now find myself, in my (non-Python) work environment, being the local expert whenever applications screw up text encodings. So I, for one, am very grateful for Python 3's clear separation of bytes and text. (And if I sometimes come across as over-dogmatic, I apologise - put it down to the enthusiasm of the recent convert :-))

Paul

[1] If you cannot see that there's no essential reason why the base-10 representation '123' should correspond to the bytes b'\x31\x32\x33' then you are probably not old enough to have started programming on EBCDIC-based computers :-)

Previous message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Next message: [Python-Dev] PEP 460: allowing %d and %f and mojibake
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list