[Python-Dev] PEP 414 - Unicode Literals for Python 3 (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Tue Feb 28 02:45:48 CET 2012


On Tue, Feb 28, 2012 at 9:19 AM, Terry Reedy <tjreedy at udel.edu> wrote:

Since writing the above, I realized that the following is a realistic scenario. 2.6 or 2.7 code a) uses has/set/getattr, so unicode literals would require a change; b) uses non-ascii chars in unicode literals; c) uses (or could be converted to use) print as a function; and d) otherwise uses a common 2-3 subset. Such would only need the u prefix addition to run under both Pythons. This works the other way, of course, for backporting code. So I am replacing 'most' with 'some unknown-to-me fraction' ;-).

Yep, that's exactly the situation I'm in with PulpDist (a web app that primarily targets deployment on RHEL 6, which means Python 2.6). Since I preformat all my print output with either str.format or str.join (or use the logging module) and always use "except exc as var" to catch exceptions, the natural way to write Python 2 code for me is almost source compatible with Python 3. The only big discrepancy I'm currently aware of? Unicode literals.

Now, I could retrofit the entire code base with the unicode_literals import and str("") for native strings, but that has problems of its own:

Basically, using the unicode_literals import would significantly raise the barrier to entry for PulpDist as a Python 2 project, as well as forcing me to switch mental models for text processing whenever I have to look at the code in a dependency during a debugging session. Therefore, given that Python 2 will be my primary target for the immediate future (and any collaborators are likely to be RHEL 6 and hence Python 2 focused), I don't want to use that particular future import. The downside of that choice (currently) is that it kills any possibility of running any of it on Python 3, even the command line client or the web front end after Django gets ported. With explicit unicode literals being restored in Python 3.3, though, I'm a lot more optimistic about the feasibility of porting it without too much effort (as well as the prospect of other Django app dependencies gaining Python 3 support).

In terms of third party upstreams, python 3 compatibility patches that affect every single string literal in the entire project (either directly or converting the entire project to the "unicode_literals" import) aren't likely to even get reviewed, let alone accepted. By contrast (for a project that already only supports 2.6+), cleaning up print statements and exception handling should be a much smaller patch that is easy to both review and accept. Making it as easy as possible for maintainers that don't really care about Python 3 to accept patches from people that do care is a very good thing.

There are still other problems that are going to affect the folks playing at the wire protocol level, but the lack of unicode literals is a big one that affects the entire application stack.

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list