[Python-Dev] PEP 460 reboot (original) (raw)

Greg Ewing greg.ewing at canterbury.ac.nz
Mon Jan 13 22:54:56 CET 2014


Nick Coghlan wrote:

By allowing format characters that do assume ASCII, the entire construct is rendered unsafe - you have to look inside the format string to determine if it is assuming ASCII compatibility or not, thus the entire construct must be deemed as assuming ASCII compatibility at the level of static semantic analysis.

I don't see how any of the currently proposed formatting operations make a data-dependent ASCII assumption.

When you write b"%d" % x, you're not assuming that x is ASCII, you're assuming that it's an integer. The %d conversion of an integer is defined to produce only ASCII characters, and it works on any integer, so there's no data-dependent assumption there.

Something that would involve such an assumption would be if b"%s" % 'hello' were defined to encode 'hello' as ASCII. But Guido has proposed not doing that, and instead interpolating ascii('hello'). Since ascii() is defined to return only ASCII characters, and works on any string, there is again no data-dependent assumption.

My preference would be for b"%s" % 'hello' to raise an exception, but that would still be data-independent.

As for having to look inside the format string to know what types are expected, that's no different from any other formatting operation. All it means is that static type analysis in Python is hard, but we already knew that.

Allowing these ASCII assuming format codes in the core bytes interpolation introduces exactly the same problem as is present in the Python 2 text model: code that appears to support arbitrary binary data, but is in fact assuming ASCII compatibility.

Can you provide an example of code using Guido's currently approved formatting semantics that would fail when given arbitrary binary data? I don't see how it can happen.

-- Greg



More information about the Python-Dev mailing list