On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
<guido@python.org> wrote:
On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon <
brett@python.org> wrote:
> I have been going on the assumption that bytes.format() would change what
> '{}' meant for itself and would only interpolate bytes. That convenient
> between Python 2 and 3 since it represents what we want it to (str and bytes
> under the hood, respectively), so it just falls through. We could also add a
> 'b' conversion for bytes() explicitly so as to help people not accidentally
> mix up things in bytes.format() and str.format(). But I was not suggesting
> adding a specific format spec for bytes but instead making bytes.format()
> just do the .encode('ascii') automatically to help with compatibility when a
> format spec was present. If people want fancy formatting for bytes they can
> always do it themselves before calling bytes.format().
This seems hastily written (e.g. verb missing :-), and I'm not clear
on what you are (or were) actually proposing. When exactly would
bytes.format() need .encode('ascii')?
I would be happy to wait a few hours or days for you to to write it up
clearly, rather than responding in a hurry.
1\. New conversion operator of 'b' that operates as PEP 460 specifies (i.e. tries to get a buffer, else calls \_\_bytes\_\_). The default conversion changes from 's' to 'b'.
2. Use of the conversion field adds an added step of calling str.encode('ascii', 'strict') on the result returned from calling __format__().
That's it. So point 1 means that the following would work in Python 3.5::
b'Hello, {}, how are you?'.format(b'Guido')
b'Hello, {!b}, how are you?'.format(b'Guido')
It would produce an error if you used a text argument for 'Guido' since str doesn't define \_\_bytes\_\_ or a buffer. That gives the EIBTI group their bytes.format() where nothing magical happens.
For point 2, let's say you have the following in Python 2::
'I have {} bottles of beer on the wall'.format(10)
Under my proposal, how would you change it to get the same result in Python 2 and 3?::
b'I have {:d} bottles of beer on the wall'.format(10)
In Python 2 you're just being more explicit about the format, otherwise it's the same semantics as today. In Python 3, though, this would translate into (under the hood)::
b'I have {} bottles of beer on the wall'.format(format(10, 'd').encode('ascii', 'strict'))
This leads to the same bytes value in Python 2 (since it's just a string) and in Python 3 (as everything accepted by bytes.format() is either bytes already or converted to from encoding to ASCII bytes). While Python 2 users would need to make sure they used a format spec to get the same result in both Python 2 and 3 for ASCII bytes, it's a minor change which also makes the format more explicit so it's not an inherently bad thing. And for those that don't want to utilize the automatic ASCII encoding they can just not use a format spec in the format string and just pass in bytes directly (i.e. call \_\_format\_\_() themselves and then call str.encode() on their own). So PBP people get to have a simple way to use bytes.format() in Python 2 and 3 when dealing with things that can be represented as ASCII (just as the bytes methods allow for currently).
I think this covers your desire to have numbers and anything else that can be represented as ASCII be supported for easy porting while covering my desire that any automatic encoding is clearly explicit in the format string and in no way special-cased for only some types (the introduction of a 'c' converter from PEP 460 is also fine with me).
How you would want to translate this proposal with the % operator I'm not sure since it has been quite a while since I last seriously used it and so I don't think I'm in a good position to propose a shift for it.