[Python-Dev] unicode imports (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Mon Jun 19 22:41:27 CEST 2006


Kristján V. Jónsson wrote:

Wouldn´t it be possible then to emulate the unix way? Simply encode any unicode paths to utf-8, process them as normal, and then decode them just prior to the actual windows io call?

That won't work. People also put path names from the ANSI code page onto sys.path and expect that to work - it always worked, and is a nearly-complete work-around to put directories with funny characters onto sys.path. sys.path is a list, so we have little control over what gets put onto it.

Of course, once there, why not do it unicode all the way up to that last point? Unless there are platforms without wchart that would make sense.

Again, we can't really control that. Also, most platforms have no wchar_t API for file IO. We would have to encode each sys.path element for each stat() call, which would be quite expensive

At any rate, I am trying to find a coding path of least resistance here. Regardless of the timeline or acceptance in mainstream python for this feature, it is something I will have to patch in for our application.

The path with least resistance should be usage of 8.3 directory names. The one to implement in future Python versions should be the rewrite of import.c, to operate on PyObject* instead of char*, and perform conversion to the native API only just before calling the native API.

Regards, Martin



More information about the Python-Dev mailing list