[Python-Dev] PEP 460 reboot (original) (raw)

Daniel Holth dholth at gmail.com
Mon Jan 13 21:07:47 CET 2014


I see it now. b"foo%sbar" % b'baz' should also expand to b"foob'foo'bar"

Instead of "%b" could "%j" mean "I should have used + or join() here but was too lazy" and work on str too?

On Mon, Jan 13, 2014 at 2:51 PM, Terry Reedy <tjreedy at udel.edu> wrote:

On 1/13/2014 1:40 PM, Brett Cannon wrote:

> So bytes formatting really needn't (and shouldn't, IMO) mirror str > formatting. This was my presumption in writing byteformat().

I think one of the things about Guido's proposal that bugs me is that it breaks the mental model of the .format() method from str in terms of how the mini-language works. For str.format() you have the conversion and the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the conversion by calling the appropriate built-in, e.g. 'r' calls repr(). The format spec semantically gets passed with the object to format() which calls the object's format() method: format(number, 'd'). Now Guido's suggestion has two parts that affect the mini-language for .format(). One is that for bytes.format() the default conversion is bytes() instead of str(), which is fine (probably want to add 'b' as a conversion value as well to be consistent). But the other bit is that the format spec goes from semantically meaning ``format(thing, formatspec)toformat(thing, formatspec).encode('ascii', 'strict')`` for at least numbers. That implicitness bugs me as I have always thought of format specs just leading to a call to format(). I think I can live with it, though, as long as it is consistently applied across the board for bytes.format(); every use of a format spec leads to calling ``format(thing, formatspec).encode('ascii', 'strict')`` no matter what type 'thing' would be and it is clearly documented that this is done to ease porting and handle the common case then I can live with it. This is how my byteformat function works, except that when no formatspec is given, byte and bytearrary objects are left unchanged rather than being decoded and encoded again. This even gives people in-place ASCII encoding for strings by always using '{:s}' with text which they can do when they port their code to run under both Python 2 and 3. So you should be able to do b'Content-Type: {:s}'.format('image/jpeg') and have it give ASCII. If you want more explicit encoding to latin-1 then you need to do it explicitly and not rely on the mini-language to do tricks for you. IOW I want to treat the format mini-language as a language and thus not have any special-casing or massive shifts in meaning between str.format() and bytes.format() so my mental model doesn't have to contort based on whether it's str or bytes. My preference is not have any, but if Guido is going say PBP here then I want absolute consistency across the board in how bytes.format() tweaks things. As for %s for the % operator calling ascii(), I think that will be a porting nightmare of finding out why your bytes suddenly stopped being formatted properly and then having to crawl through all of your code for that one use of %s which is getting bytes in. By raising a TypeError you will very easily detect where your screw-up occurred thanks to the traceback; do so otherwise feels too much like implicit type conversion and ask any JavaScript developer how that can be a bad thing. I personally would not add 'bytes % whatever'. -- Terry Jan Reedy


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com



More information about the Python-Dev mailing list