[Python-Dev] PEP 460 reboot (original) (raw)

Glenn Linderman v+python at g.nevcal.com
Tue Jan 14 07:37:17 CET 2014


On 1/13/2014 9:25 PM, Nick Coghlan wrote:

since this observation makes it clear that there'sno coherent way to offer a pure binary interpolation API - the only general purpose combination mechanism for segments of binary data that can avoid making assumptions about the encodings of metacharacters is simple concatenation. That's almost true, and I'm glad that you, Guido, and all of us can understand that the currently defined python2 and python3 formatting syntaxes contain an inherent ASCII assumption, just like many internet protocols. The bitter fight is over :)

However, your statement above isn't 100% accurate, so just for the pedantry of it, I'll point out why. A mechanism could be defined where "format string" would only contain format specifications, and any other text would be considered an error. The format string could have an explicit or a defined encoding, there would be no need to make an assumption about its encoding. And since it would not contain text except for format specifications, it would only be used as a rule-book on how to interpret the parameters, contributing no text of its own to the result.

This wouldn't solve the problem at hand, though, which is to provide a nice migration path from Python 2 to Python 3 for code that uses ASCII-based format strings that do contribute text as well as include parameter data.

Whether such a technique would be more useful than simple concatenation (or complex concatenation such as join) remains to be seen, and possibly discussed, if anyone is interested, but it probably would belong on python-ideas, since it would not address an immediate porting issue.

Assuming an ASCII-in-bytes format string (but with no contributed text to the result) one could write something like

b"%{koi7}s%{00}v%{big5}d%{00}v%{ShiftJIS}s%{0000}v%b" / ( cyrillic, len( blob ), japanese, blob )

So the encodings to be applied to each of the input parameters could be explicitly specified.

The %{00}v stuff would be interpolated into the output... expressed in ASCII as hex, two characters per byte. Note that the number uses Chinese digits in the big5 encoding, but I don't know if the Chinese even use their own digits or ASCII ones these days, or what base they use, I guess it was the Babylonians that used base 60 from which our timekeeping and angular measures were derived. The example shows a null byte or two between items in the output.

So there could be a coherent way to offer an interpolation mechanism that is pure binary, and allows selection of encoding of str data, if and as needed. One specifier could even be an encoding to apply to any format specifiers that don't include an encoding, so in the typical case of dealing with a single language output, the appropriate encoding could be set at the beginning of the format specification and overridden by particular specifiers if need be. But while there could be such an interpolation mechanism, it isn't compatible with Python 2, and the jury hasn't decided whether such a thing is sufficiently more useful than concatenation to be worth implementing. A different operator might be required, or the whole thing could be a function instead of an operator, with a similar format specification, or one more like the minilanguage used with format in python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/f5fa256d/attachment.html>



More information about the Python-Dev mailing list