[Python-Dev] PEP 460 reboot (original) (raw)

Yury Selivanov yselivanov.ml at gmail.com
Tue Jan 14 18:29:58 CET 2014


Brett,

I like your proposal.  There is one idea I have that could, perhaps, improve it:

  1. “%s" and “{}” will continue to work for bytes and bytearray in the following fashion:

 - check if bytes/Py_buffer supported.  - if it is, check that the bytes are strictly in the printable     ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).    Throw an error if the check fails. If not - concatenate.  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.

This way most of the use cases of python2 will be covered without touching the code. So:

 - b’Hello {}’.format(‘world’)     will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)

 - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError

 - b’Status: {}’.format(200)    will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)

 - b’Hello %s’ % (‘world’,) - the same as the first example

 - b’Connection: {}’.format(b’keep-alive’) - works

 - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept

I think it’s OK to check the buffers for ASCII-subset only. Yes, it will have some sort of sub-optimal performance, but then, it’s quite rare when string formatting is used to concatenate huge buffers.

  1. new operators {!b} and %b. This ones will just use ‘__bytes__’ and  Py_buffer.

--
Yury Selivanov

On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote:

On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum wrote: > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon wrote: > > I have been going on the assumption that bytes.format() would change what > > '{}' meant for itself and would only interpolate bytes. That convenient > > between Python 2 and 3 since it represents what we want it to (str and > bytes > > under the hood, respectively), so it just falls through. We could also > add a > > 'b' conversion for bytes() explicitly so as to help people not > accidentally > > mix up things in bytes.format() and str.format(). But I was not > suggesting > > adding a specific format spec for bytes but instead making bytes.format() > > just do the .encode('ascii') automatically to help with compatibility > when a > > format spec was present. If people want fancy formatting for bytes they > can > > always do it themselves before calling bytes.format(). > > This seems hastily written (e.g. verb missing :-), and I'm not clear > on what you are (or were) actually proposing. When exactly would > bytes.format() need .encode('ascii')? > > I would be happy to wait a few hours or days for you to to write it up > clearly, rather than responding in a hurry.

Sorry about that. Busy day at work + trying to stay on top of this entire conversation was a bit tough. Let me try to lay out what I'm suggesting for bytes.format() in terms of how it changes http://docs.python.org/3/library/string.html#format-string-syntax for bytes. 1. New conversion operator of 'b' that operates as PEP 460 specifies (i.e. tries to get a buffer, else calls bytes). The default conversion changes from 's' to 'b'. 2. Use of the conversion field adds an added step of calling str.encode('ascii', 'strict') on the result returned from calling format(). That's it. So point 1 means that the following would work in Python 3.5:: b'Hello, {}, how are you?'.format(b'Guido') b'Hello, {!b}, how are you?'.format(b'Guido') It would produce an error if you used a text argument for 'Guido' since str doesn't define bytes or a buffer. That gives the EIBTI group their bytes.format() where nothing magical happens. For point 2, let's say you have the following in Python 2:: 'I have {} bottles of beer on the wall'.format(10) Under my proposal, how would you change it to get the same result in Python 2 and 3?:: b'I have {:d} bottles of beer on the wall'.format(10) In Python 2 you're just being more explicit about the format, otherwise it's the same semantics as today. In Python 3, though, this would translate into (under the hood):: b'I have {} bottles of beer on the wall'.format(format(10, 'd').encode('ascii', 'strict')) This leads to the same bytes value in Python 2 (since it's just a string) and in Python 3 (as everything accepted by bytes.format() is either bytes already or converted to from encoding to ASCII bytes). While Python 2 users would need to make sure they used a format spec to get the same result in both Python 2 and 3 for ASCII bytes, it's a minor change which also makes the format more explicit so it's not an inherently bad thing. And for those that don't want to utilize the automatic ASCII encoding they can just not use a format spec in the format string and just pass in bytes directly (i.e. call format() themselves and then call str.encode() on their own). So PBP people get to have a simple way to use bytes.format() in Python 2 and 3 when dealing with things that can be represented as ASCII (just as the bytes methods allow for currently). I think this covers your desire to have numbers and anything else that can be represented as ASCII be supported for easy porting while covering my desire that any automatic encoding is clearly explicit in the format string and in no way special-cased for only some types (the introduction of a 'c' converter from PEP 460 is also fine with me). How you would want to translate this proposal with the % operator I'm not sure since it has been quite a while since I last seriously used it and so I don't think I'm in a good position to propose a shift for it.


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com



More information about the Python-Dev mailing list