[Python-Dev] PEP 460 reboot (original) (raw)

Guido van Rossum guido at python.org
Tue Jan 14 19:52:05 CET 2014


On Tue, Jan 14, 2014 at 9:45 AM, Chris Barker <chris.barker at noaa.gov> wrote:

On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:

- Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result. please no -- that's the source of a lot of pain in py2 now. having a failure as a result of the value, rather than the type, of an object just makes hard-to-test for bugs. Everything will be hunky dory for development and testing, then in deployment some idiot ( ;-) ) will pass in some non-ascii compatible string and you get failure. And the person who gets the failure doesn't understand why, or they wouldn't have passed in non-ascii values in the first place... Ease of porting is nice, but let's not make it easy to port bug-prone code.

Right. This is a big red flag to me as well.

I think there is some inherent conflict between the extensible design of str.format() and the practical needs of people who are actually going to use formatting operations (either % or .format()) with bytes.

The practical needs are mostly limited to supporting basic number formatting (decimal, hex, padding) and interpolation of anything that supports the buffer interface. It would also be nice if you didn't have to specify the type at all in the format string, i.e. {} should do the right thing for numbers and (all sorts of) bytes.

But the way to arrive at this behavior without duplicating a whole lot of code seems to be to call the existing text-based format API and convert the result to bytes -- for numbers this should be safe (their formatting produces just ASCII digits and a selected few other ASCII characters) but leads to an undesirable outcome for other types -- not just str but also e.g. lists or dicts containing str instances, since those call repr on the contained items, and repr() may produce non-ASCII bytes.

This is why my earlier proposal used ascii(), which is a "nerfed"(*) version of repr(). This does the right thing for numbers as well as for many other types (e.g. None, bool) and does something unpleasant for text strings that is perhaps better than the alternative.

Which reminds me. Quite a few people have spoken out in favor of loud failures rather than silent "wrong" output. But I think that in the specific context of formatting output, there is a long and IMO good tradition of producing (slightly) wrong output in favor of more strict behavior. Consider for example what to do when a number doesn't fit in the given width. Would you rather raise an exception, truncate the value, or mess up the formatting? All languages newer than Fortran that I've used have chosen the latter, and I still agree it's a good idea. Similar with infinities, NaN, or None. (Yes, it's embarrassing to have a website displaying 'null'. But isn't a 500 even more embarrassing?)

This doesn't mean I'm insensitive to the argument in favor of loud and early failure. It's just that I can see both sides of the coin, and I'm still deciding which argument is more important.

(*) Gamer slang for a weapon made less dangerous. :-)

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list