[Python-Dev] PEP 460: allowing %d and %f and mojibake (original) (raw)

Victor Stinner victor.stinner at gmail.com
Sun Jan 12 02:01:01 CET 2014


Hi,

2014/1/11 Antoine Pitrou <solipsis at pitrou.net>:

b'x=%s' % 10 is well defined, it's pure bytes. It is well-defined? Then please explain me what the general case of b'%s' % x is supposed to call: - does it call x.bytes? int.bytes doesn't exist - does it call bytes(x)? bytes(10) gives b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' - does it call x.str? you've reintroduced the Python 2 behaviour of conflating bytes and unicode

I don't want to call any method from bytes%args, only Py_buffer API would be used. So the pseudo-code becomes:

Or:

--

I discussed with Antoine to try to understand how and why we disagree.

Antoine prefers a pure API, whereas I'm trying to figure out if it would be possible to write code compatible with Python 2 and Python 3.

Using Antoine's PEP, it's possible to write code working on Python 2 and Python 3 which only manipulate bytes strings.

The problem is that it's a pain to write a code working on both Python versions when an argument is an integer. For example, the Python 2 code "Content-Length: %s\r\n" % 123 is written ("Content-Length: %s\r\n" % 123).encode('ascii') in Python 3. So Python 2 and Python 3 codes are different.

Supporting formating integers would allow to write b"Content-Length: %s\r\n" % 123, which would work on Python 2 and Python 3.

(u'Content-Length: %s\r\n' % 123).encode('ascii') works on both Python versions, but it may require more work to Python 2 code on Python 3.

--

Now I'm trying to find use cases in Mercurial and Twisted source code to see which features are required. First, I'm looking for a function requiring to format a number in decimal in a bytes string.

In issue #3982, I saw:

""" HTTP chunking' uses ASCII mixed with binary (octets). With 2.6 you could write:

def chunk(block): return b'{0:x}\r\n{1}\r\n'.format(len(block), block)" """

and

""" 'Content-length: {}\r\n'.format(length) """

But are the examples real use cases, or artifical examples?

--

Augie Fackler gave an example from Mercurial: """ sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path': 'some/filesystem/path'})

except we don't know the encoding of the filesystem path (Hi unix!) so we have to treat the whole thing as opaque bytes. It's even more fun for 'log', becase then it's got localized strings in it as well. """

But here I disagree with the design of Mercurial, filenames should be treated as text. If a filename would be pure binary, you should not write it in a terminal. Displaying binary data usually leads to displaying random characters and changing terminal options (ex: text starts blinking or is displayed in bold!?) :-)

For the localized string: again, it's also a design issue in my opinion. A localized string is text, not binary data :-)

--

Another option is that I cannot find usecases because there are no use cases for the PEP 460 and the PEP is useless :-)

Victor



More information about the Python-Dev mailing list