[Python-Dev] PEP 460 reboot (original) (raw)

Donald Stufft donald at stufft.io
Mon Jan 13 19:58:06 CET 2014


On Jan 13, 2014, at 1:45 PM, Daniel Holth <dholth at gmail.com> wrote:

On Mon, Jan 13, 2014 at 12:42 PM, R. David Murray <rdmurray at bitdance.com> wrote:

On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou <solipsis at pitrou.net> wrote:

On Sun, 12 Jan 2014 18:11:47 -0800 Guido van Rossum <guido at python.org> wrote:

On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

On 01/12/2014 04:47 PM, Guido van Rossum wrote:

%s seems the trickiest: I think with a bytes argument it should just insert those bytes (and the padding modifiers should work too), and for other types it should probably work like %a, so that it works as expected for numeric values, and with a string argument it will return the ascii()-variant of its repr(). Examples:

b'%s' % 42 == b'42' b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x' enclosed in single quotes) I'm not sure about the quotes. Would anyone ever actually want those in the byte stream? Perhaps not, but it's a hint that you should probably think about an encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of it as payback time. :-) What is the use case for embedding a quoted ASCII-encoded representation in a byte stream? There is no use case in the sense you are asking, just like there is no real use case for '%s' % b'x' producing "b'x'". But the real use case is exactly the same: to let you know your code is screwed up without actually blowing up with a encoding Exception. For the record, I like Guido's logic and proposal. I don't understand Nick's objection, since I don't see the difference between the situation here where a string gets interpolated into bytes as 'xxx' and the corresponding situation where bytes gets interpolated into a string as b'xxx'. Why struggle to keep bytes interpolation "pure" if string interpolation isn't? Guido's proposal makes the language more symmetric, and thus more consistent and less surprising. Exactly the hallmarks of Python's design sense, IMO. (Big surprise, right? :) Of course, this point of view is based on the idea that when you are doing interpolation using %/.format, you are in fact primarily concerned with ASCII compatible byte streams. This is a Practicality sort of argument. It is, after all, by far the most common use case when doing interpolation[*]. If you wanted to do a purist version of this symmetry, you'd have bytes(x) calling bytes if it was defined and falling back to calling a brepr otherwise. But what would brepr implement? The variety of format codes in the struct module argues that there is no "one obvious" binary repr for most types. (Those that have one would implement bytes). And what would be the brepr of an arbitrary 'object'? Faced with the impracticality of defining brepr usefully in any "pure bytes" form, it seems sensible to admit that the most useful brepr is the ascii() encoding of the repr. Which naturally produces 'xxx' as the brepr of a string. This does cause things to get a little un-pretty when you are operating at the python prompt: b'%s' % object b'"<class \'object\'>"' But then again that is most likely really not what you mean to do, so it becomes a big red flag...just like b'xxx' is a small red flag when you accidentally interpolate unencoded bytes into a string. --David PS: When I first read Guido's remark that the result of interpolating a string should be 'xxx', I went Wah? I had to reason my way through to it as above, but to him it was just the natural answer. Guido isn't always right, but this kind of automatic language design consistency is one reason he's the BDFL. [*] I still think that you mostly want to design your library so that you are handling the text parts as text and the bytes parts as bytes, and encoding/gluing them as appropriate at the IO boundary. But if Guido says his real code would benefit by being able to interpolate ASCII into bytes at certain points, I'll believe him. If you think corrupted data is easier or more pleasant to track down than encoding exceptions then I think you are strange. It makes porting really difficult while you are still trying to figure out where the bytes/str boundaries are. I am now deeply suspicious of all % formatting.


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io

For the record, I think %d and %f and such where the RHS is guaranteed to have a certain set of “characters” that are guaranteed to be ascii compatible is fine and it’s perfectly acceptable to have an implicit ASCII encode for them. The %s code I’m not sure of, I think trying to ascii encode that (just using encode()) is dangerous, and I think that using ascii() and adding quotes to it is never what anyone is going to want. Given that I think it’d be far better to blow up if you’re using %s (or at least using %s on a str object and not as an alias for %b) than to implicitly encode that (given we don’t know what the RHS can contain) or to throw junk data into the bytes that we know pretty much nobody ever is going to actually want.


Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/ef67a65e/attachment-0001.sig>



More information about the Python-Dev mailing list