[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Toshio Kuratomi a.badger at gmail.com
Thu Dec 4 22:15:42 CET 2008


Adam Olsen wrote:

On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:

I opened up bug http://bugs.python.org/issue4006 a while ago and it was suggested in the report that it's not a bug but a feature and so I should come here to see about getting the feature changed :-)

I have a specific problem with os.environ and a somewhat less important architectural issue with the unicode/bytes handling in certain os.* modules. I'll start with the important one: Currently in python3 there's no way to get at environment variables that are not encoded in the system default encoding. My understanding is that this isn't a problem on Windows systems but on *nix this is a huge problem. environment variables on *nix are a sequence of non-null bytes. These bytes are almost always "characters" but they do not have to be. Further, there is nothing that requires that the characters be in the same encoding; some of the characters could be in the UTF-8 character set while others are in latin-1, shift-jis, or big-5. Multiple encoding environments are best described as "batshit insane". It's impossible to handle any of it correctly as text, which is why UTF-8 is becoming a universal standard. For everybody's sanity python should continue to push it. Amen brother!

However, some pragmatism is also possible.

Unfortunately, this is exactly what I'm talking about :-)

Many uses of PATH may allow it to be treated as black-box bytes, rather than text. The minimal solution I see is to make os.getenv() and os.putenv() switch to byte modes when given byte arguments, as os.listdir() does. This use case doesn't require the ability to iterate over all environment variables, as os.environb would allow. This would be a partial implementation of my option #3. It allows the programmer to workaround problems but does allow subtle bugs to creep in unawares. For instance::

I do wonder if controlling the environment given to a subprocess requires os.environb, but it may be too obscure to really matter. If you wanted to change one variable before passing it on to the subprocess this could lead to head-scratcher bugs. Here's a contrived example: Say I have an app that talks to multiple cvs repositories. It copies os.environ and modifies CVSROOT and CVS_RSH then calls subprocess with env=temp_env. If the PATH variable contains non-decodable elements on some machines, this could lead to mysterious failures. This is particularly bad because we aren't directly modifying PATH anywhere in our code so there won't be an obvious reason in the code that this is failing.

-Toshio

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/1f049950/attachment.pgp>



More information about the Python-Dev mailing list