[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

M.-A. Lemburg mal at egenix.com
Mon Dec 8 15:54:44 CET 2008


On 2008-12-06 01:48, Nick Coghlan wrote:

You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on. Non-filesystem related apps have no business trying to deal with insane filenames.

This is not entirely true: OSes, shells, and applications will typically represent the file names using either ?-replacements or some form of hex or decimal escapes for the characters they can't decode. Since humans are usually very good at pattern recognition, this goes a long way.

Of course, how the application maps that partially converted file name back to the real thing is another issue and that's something that Python should not make harder than it should be.

Linux is moving towards a standard of UTF-8 for filenames, and once we get to the point where the idea of encoding filenames and environment variables any other way is seen as crazy, then the Python 3 approach will work seamlessly.

It's going to take a long time before file names, environment variables and command line parameters are all encoded using UTF-8, so "practicality beats purity" will have to get more attention in this thread.

Python APIs should work out of the box most of the time.

Currently, if you live in a non-ASCII and non-pure-UTF-8 environment, you have to deal with different and mixed encodings on a regular basis.

Whether that's a USB stick, you're trying to read, a ZIP file you're trying to open, a mounted network drive, etc. the problem pops up in many different kinds of areas.

If I write "do_something.py *" I expect Python to indeed work on all the files in my directory, not just the one that happen to fit a particular encoding.

If I hook up a CGI script written in Python with a web server, I expect all data to be received by the script, not just data that happens to be UTF-8 encoded.

In the meantime, raw bytes APIs will provide an alternative for those that disagree with that philosophy.

I think that's a wrong way to put it: The problems are not made up by people who disagree with the one-encoding-for-everything strategy.

The problems occur in real-life IT processing all the time - maybe not so much in places where English scripts dominate, but certainly in most other places with non-English scripts.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Dec 08 2008)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/



More information about the Python-Dev mailing list