[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Thu Jan 9 11:32:26 CET 2014


On 9 Jan 2014 11:29, "INADA Naoki" <songofacandy at gmail.com> wrote:

And I think everyone was well intentioned - and python3 covers most of the bases, but working with binary data is not only a "wire-protocol programmer's" problem.

If you're working with binary data, use the binary API offered by bytes, bytearray and memoryview.

Needing a library to wrap bytesthing.format('ascii', 'surrogateescape')

or some such thing makes python3 less approachable for those who haven't learned that yet - which was almost all of us at some point when we started programming. Totally agree with you.

If you're on a relatively modern OS, everything should be UTF-8 and you should be fine as a beginner.

When you start encountered malformed data, Python 3 should throw an error, and provide an opportunity to learn more (by looking up the error message), where Python 2 would silently corrupt the data stream.

Python 2 enshrined a data model eminently suitable for boundary code that dealt with ASCII compatible binary protocols (like web frameworks) as the default text model. Application code then needed to take special steps to get correct behaviour for the full Unicode range. In essence, the Python 2 text model is the POSIX text model with Unicode support bolted on to the side to make it at least possible to write correct application code.

This is completely backwards. Web applications vastly outnumber web frameworks, and the same goes for every other domain: applications are vastly more common than the libraries and frameworks that handle data transformations at system boundaries on their behalf, so making the latter easier to write at the expense of the former is a deeply flawed design choice.

So Python 3 reverses the situation: the core text model is now more appropriate for the central application code, after the boundary code has cleaned up the murky details of wire protocols and file formats.

This is pretty easy to deal with for new Python 3 code, since you just write things to deal with either bytes or text as appropriate.

However, there is some code written for Python 2 that relies more heavily on the ability to treat ascii compatible binary data as both binary data and as text. This is the use case that Python 3 treats as a more specialised use case (perhaps benefitting from a specialised third party type), whereas Python 2 supports it by default.

This is also the use case that relied most heavily on implicit encoding and decoding, since that's the mechanism that allows the 8-bit and Unicode paths to share string literals.

Cheers, Nick.

-- INADA Naoki <songofacandy at gmail.com>


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140109/574ae2c5/attachment-0001.html>



More information about the Python-Dev mailing list