[Python-Dev] Python 3.x and bytes (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed May 18 18:16:44 CEST 2011


Robert Collins writes:

Its probably too late to change, but please don't try to argue that its correct: the continued confusion of folk running into this is evidence that confusion is happening. Treat that as evidence and think about how to fix it going forward.

Sorry, Rob, but you're just wrong here, and Nick is right. It's possible to improve Python 3, but not to "fix" it in this respect. The Python 3 solution is correct, the Python 2 approach is not. There's no way to avoid discontinuity and confusion here.

Confusion is indeed happening, but it's real confusion in the way people think about the problem space, not a language design cockup. The problem can't be solved by embedding ASCII in Unicode, because non-ASCII bytes don't have a canonical embedding in Unicode. Ie, the situation is inherently confusing. You can't wish it away, you can only choose to impose more or less of it on particular constituencies.

Now, it's quite possible that there are other correct approaches that allow straightforward manipulation of non-ASCII text, but I don't know what they are, and I don't know anybody else who does.



More information about the Python-Dev mailing list