[Python-Dev] Migration from Python 2.7 and bytes formatting (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sun Jan 19 03:27:12 CET 2014

Previous message: [Python-Dev] Migration from Python 2.7 and bytes formatting
Next message: [Python-Dev] Migration from Python 2.7 and bytes formatting
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Neil Schemenauer writes:

That's it. After sleeping on it, I'm not sure that's enough Python 2.x compatibility to help a lot. I haven't ported much code to 3.x yet but I imagine the following are major challenges:

comparisons between str and bytes always returns unequal

indexing/iterating bytes returns integers, not bytes objects

concatenation of str and bytes fails (not so bad since a TypeError is generated right away).

Experience shows these are rarely major challenges. The reason we are having this discussion is that if you are the kind of programmer who runs into challenges once, you are likely to run into all of the above and more, repeatedly, and addressing them using features available in Python up to v3.3 make your code unreadable.

In other words, it's like unemployment at 5%. It would be bearable (just) if the pain were shared by 100% of the people being 5% unemployed, but rather the burden falls on the 5% who are 100% unemployed.

Now, the problem that many existing libraries face is that they were designed for monolingual environments where text encodings are more or less ASCII compatible[1]. If you stay in the Python 2 world, you can "internationalize" with the existing design, more or less limp along, fixing encoding bugs as they arise (not "if" but "when", and it can take a decade to find them all). But Python 3 strongly discourages that policy. From the point of view of design for the modern environment, such libraries really should have their I/O modules rewritten from scratch (not a huge job), and the necessary adjustments made in processing code (few but randomly dispersed through the code, and each a ticking time bomb for your users). But I stress that the problem here is that the design of such libraries is at fault, not Python 3. The world has changed.[2]

And then there are the remaining 5% or so that really need to work mostly in bytes, but want to use string formatting to format their byte streams. I used to think that this was just a porting convenience, but I was wrong. Code written this way is often more concise and more readable than code written using .join() or the struct module. It should be written using string formatting. And that's what PEPs 460 and 461 are intended to address.

We'll see what happens as these PEPs are implemented, but I suspect that we'll find that there are very few bandaids left that are of much use. That is, as I claimed above, for the remaining problematic libraries a redesign will be needed.

Footnotes: [1] In the technical sense that you can rely on ASCII bytes to mean ASCII characters, not part of a non-ASCII character.

[2] And if the world hasn't changed for your application, what's wrong with staying with Python 2?

Previous message: [Python-Dev] Migration from Python 2.7 and bytes formatting
Next message: [Python-Dev] Migration from Python 2.7 and bytes formatting
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list