[Python-Dev] Adding the 'path' module (was Re: Some RFE for review) (original) (raw)

Guido van Rossum gvanrossum at gmail.com
Mon Jul 11 18:06:18 CEST 2005


I'm in full agreement with Marc-Andre below, except I don't like (1) at all -- having used other APIs that always return Unicode (like the Python XML parsers) it bothers me to get Unicode for no reason at all. OTOH I think Python 3.0 should be using a Unicode model closer to Java's.

On 7/11/05, M.-A. Lemburg <mal at egenix.com> wrote:

Neil Hodgson wrote: > On unicode versions of Windows, for attributes like os.listdir, > os.getcwd, sys.argv, and os.environ, which can usefully return unicode > strings, there are 4 options I see: > > 1) Always return unicode. This is the option I'd be happiest to use, > myself, but expect this choice would change the behaviour of existing > code too much and so produce much unhappiness.

Would be nice, but will likely break too much code - if you let Unicode object enter non-Unicode aware code, it is likely that you'll end up getting stuck in tons of UnicodeErrors. If you want to get a feeling for this, try running Python with -U command line switch. > 2) Return unicode when the text can not be represented in ASCII. This > will cause a change of behaviour for existing code which deals with > non-ASCII data. +1 on this one (s/ASCII/Python's default encoding). > 3) Return unicode when the text can not be represented in the default > code page. While this change can lead to breakage because of combining > byte string and unicode strings, it is reasonably safe from the point > of view of data integrity as current code is returning garbage strings > that look like '?????'. -1: code pages are evil and the reason why Unicode was invented in the first place. This would be a step back in history. > 4) Provide two versions of the attribute, one with the current name > returning byte strings and a second with a "u" suffix returning > unicode. This is the least intrusive, requiring explicit changes to > code to receive unicode data. For patch #1231336 I chose this approach > producing sys.argvu and os.environu. -1 - this is what Microsoft did for many of their APIs. The result is two parallel universes with two sets of features, bugs, documentation, etc. > For os.listdir the current behaviour of returning unicode when its > argument is unicode can be retained but that is not extensible to, for > example, sys.argv. I don't think that using the parameter type as "parameter" to function is a good idea. However, accepting both strings and Unicode will make it easier to maintain backwards compatibility. > Since this issue may affect many attributes a common approach > should be chosen. Indeed. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 11 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list