[Python-Dev] PEP 460 reboot (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Tue Jan 14 13:46:25 CET 2014


On 14 January 2014 19:54, Stephen J. Turnbull <stephen at xemacs.org> wrote:

Guido van Rossum writes: > And that is precisely my point. When you're using a format string, > all of the format string (not just the part between { and }) had > better use ASCII or an ASCII superset. And this (rightly) > constrains the output to an ASCII superset as well.

Except that if you interpolate something like Shift JIS, much of the ASCII really isn't ASCII. That's a general issue, of course, if you do something that requires iterated format strings, but it's far more likely to appear to work most of the time with those encodings. Of course you can say "if it hurts, don't do that", but ....

Right, that's the danger I was worried about, but the problem is that there's at least some minimum level of ASCII compatibility that needs to be assumed in order to define an interpolation format at all (this is the point I originally missed). For printf-style formatting, it's % along with the various formatting characters and other syntax (like digits, parentheses, variable names and "."), with the format method it's braces, brackets, colons, variable names, etc. The mini-language parser has to assume in encoding in order to interpret the format string, and that's all done assuming an ASCII compatible format string (which must make life interesting if you try to use an ASCII incompatible coding cookie for your source code - I'm actually not sure what the full implications of that are for bytes literals in Python 3).

The one remaining way I could potentially see a formatb method working is along the lines of what Glenn (I think) suggested: just like struct definitions, the formatb specifier would have to consist solely of substitution fields. However, that's getting awfully close to being just an alternate spelling for the struct module or bytes.join at that point, which hardly makes for a compelling case to add two new methods to a builtin type.

Given that one of the concepts with the Python 3 transition was to take certain problematic constructs (like ASCII compatible interpolation directly to binary without a separate encoding step) away and decide whether or not we were happy to live without them, I think this one has proven to have sufficient staying power to finally bring it back in Python 3.5 (especially given the gain in lowering the barrier to porting Python 2 code that makes heavy use of interpolation to ASCII compatible binary formats).

It's certainly a decision that has its downsides, with the potential impact on users of ASCII incompatible encodings (mostly in Asia) being the main one, but I think the increased convenience in working with ASCII compatible binary protocols and file formats is worth the cost.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list