[Python-Dev] casefolding in pathlib (PEP 428) (original) (raw)

Guido van Rossum guido at python.org
Fri Apr 12 00:42:00 CEST 2013


On Thu, Apr 11, 2013 at 2:27 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

On Thu, 11 Apr 2013 14:11:21 -0700 Guido van Rossum <guido at python.org> wrote:

Hey Antoine,

Some of my Dropbox colleagues just drew my attention to the occurrence of case folding in pathlib.py. Basically, case folding as an approach to comparing pathnames is fatally flawed. The issues include: - most OSes these days allow the mounting of both case-sensitive and case-insensitive filesystems simultaneously - the case-folding algorithm on some filesystems is burned into the disk when the disk is formatted The problem is that: - if you always make the comparison case-sensitive, you'll get false negatives - if you make the comparison case-insensitive under Windows, you'll get false positives My assumption was that, globally, the number of false positives in case (2) is much less than the number of false negatives in case (1). On the other hand, one could argue that all comparisons should be case-sensitive and the proper way to test for "identical" paths is to access the filesystem. Which makes me think, perhaps concrete paths should get a "samefile" method as in os.path.samefile(). Hmm, I think I'm tending towards the latter right now.

Python on OSX has been using (1) for a decade now without major problems.

Perhaps it would be best if the code never called lower() or upper() (not even indirectly via os.path.normcase()). Then any case-folding and path-normalization bugs are the responsibility of the application, and we won't have to worry about how to fix the stdlib without breaking backwards compatibility if we ever figure out how to fix this (which I somehow doubt we ever will anyway :-).

Some other issues to be mindful of:

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list