(original) (raw)
On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
- Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
please no -- that's the source of a lot of pain in py2 now.
having a failure as a result of the value, rather than the type, of an object just makes hard-to-test for bugs. Everything will be hunky dory for development and testing, then in deployment some idiot ( ;-) ) will pass in some non-ascii compatible string and you get failure. And the person who gets the failure doesn't understand why, or they wouldn't have passed in non-ascii values in the first place...
Ease of porting is nice, but let's not make it easy to port bug-prone code.
-Chris
This way \*most\* of the use cases of python2 will be covered without
touching the code. So:
\- b’Hello {}’.format(‘world’)
will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
\- b’Hello {}’.format(‘\\u0394’) will throw UnicodeEncodeError
\- b’Status: {}’.format(200)
will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
\- b’Hello %s’ % (‘world’,) - the same as the first example
\- b’Connection: {}’.format(b’keep-alive’) - works
\- b’Hello %s’ % (b'\\xce\\x94’,) - will fail, not ASCII subset we accept
I think it’s OK to check the buffers for ASCII-subset only. Yes, it
will have some sort of sub-optimal performance, but then, it’s quite
rare when string formatting is used to concatenate huge buffers.
2\. new operators {!b} and %b. This ones will just use ‘\_\_bytes\_\_’ and
Py\_buffer.
\--
Yury Selivanov
On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett@python.org) wrote:
\>
\> On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
\> wrote:
\>
\> > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon
\> wrote:
\> > > I have been going on the assumption that bytes.format() would
\> change what
\> > > '{}' meant for itself and would only interpolate bytes. That
\> convenient
\> > > between Python 2 and 3 since it represents what we want it to
\> (str and
\> > bytes
\> > > under the hood, respectively), so it just falls through. We
\> could also
\> > add a
\> > > 'b' conversion for bytes() explicitly so as to help people
\> not
\> > accidentally
\> > > mix up things in bytes.format() and str.format(). But I was
\> not
\> > suggesting
\> > > adding a specific format spec for bytes but instead making
\> bytes.format()
\> > > just do the .encode('ascii') automatically to help with compatibility
\> > when a
\> > > format spec was present. If people want fancy formatting for
\> bytes they
\> > can
\> > > always do it themselves before calling bytes.format().
\> >
\> > This seems hastily written (e.g. verb missing :-), and I'm not
\> clear
\> > on what you are (or were) actually proposing. When exactly would
\> > bytes.format() need .encode('ascii')?
\> >
\> > I would be happy to wait a few hours or days for you to to write it
\> up
\> > clearly, rather than responding in a hurry.
\>
\>
\> Sorry about that. Busy day at work + trying to stay on top of this
\> entire
\> conversation was a bit tough. Let me try to lay out what I'm suggesting
\> for
\> bytes.format() in terms of how it changes
\> http://docs.python.org/3/library/string.html#format-string-syntax
\> for bytes.
\>
\> 1\. New conversion operator of 'b' that operates as PEP 460 specifies
\> (i.e.
\> tries to get a buffer, else calls \_\_bytes\_\_). The default conversion
\> changes from 's' to 'b'.
\> 2\. Use of the conversion field adds an added step of calling
\> str.encode('ascii', 'strict') on the result returned from
\> calling
\> \_\_format\_\_().
\>
\> That's it. So point 1 means that the following would work in Python
\> 3.5::
\>
\> b'Hello, {}, how are you?'.format(b'Guido')
\> b'Hello, {!b}, how are you?'.format(b'Guido')
\>
\> It would produce an error if you used a text argument for 'Guido'
\> since str
\> doesn't define \_\_bytes\_\_ or a buffer. That gives the EIBTI group
\> their
\> bytes.format() where nothing magical happens.
\>
\> For point 2, let's say you have the following in Python 2::
\>
\> 'I have {} bottles of beer on the wall'.format(10)
\>
\> Under my proposal, how would you change it to get the same result
\> in Python
\> 2 and 3?::
\>
\> b'I have {:d} bottles of beer on the wall'.format(10)
\>
\> In Python 2 you're just being more explicit about the format,
\> otherwise
\> it's the same semantics as today. In Python 3, though, this would
\> translate
\> into (under the hood)::
\>
\> b'I have {} bottles of beer on the wall'.format(format(10,
\> 'd').encode('ascii', 'strict'))
\>
\> This leads to the same bytes value in Python 2 (since it's just
\> a string)
\> and in Python 3 (as everything accepted by bytes.format() is
\> either bytes
\> already or converted to from encoding to ASCII bytes). While
\> Python 2 users
\> would need to make sure they used a format spec to get the same result
\> in
\> both Python 2 and 3 for ASCII bytes, it's a minor change which also
\> makes
\> the format more explicit so it's not an inherently bad thing.
\> And for those
\> that don't want to utilize the automatic ASCII encoding they
\> can just not
\> use a format spec in the format string and just pass in bytes directly
\> (i.e. call \_\_format\_\_() themselves and then call str.encode()
\> on their
\> own). So PBP people get to have a simple way to use bytes.format()
\> in
\> Python 2 and 3 when dealing with things that can be represented
\> as ASCII
\> (just as the bytes methods allow for currently).
\>
\> I think this covers your desire to have numbers and anything else
\> that can
\> be represented as ASCII be supported for easy porting while covering
\> my
\> desire that any automatic encoding is clearly explicit in the
\> format string
\> and in no way special-cased for only some types (the introduction
\> of a 'c'
\> converter from PEP 460 is also fine with me).
\>
\> How you would want to translate this proposal with the % operator
\> I'm not
\> sure since it has been quite a while since I last seriously used
\> it and so
\> I don't think I'm in a good position to propose a shift for it.
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> Python-Dev mailing list
\> Python-Dev@python.org
\> https://mail.python.org/mailman/listinfo/python-dev
\> Unsubscribe: https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
\>
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov