[Python-Dev] os.path.normcase rationale? (original) (raw)

Steven D'Aprano steve at pearwood.info
Sat Sep 25 05:25:26 CEST 2010


On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:

I think that, like os.path.realpath(), it should not fail if the file does not exist.

Maybe the API could be called os.path.unnormpath(), since it is in a sense the opposite of normpath() (which removes case) ? But I would want to write it so that even on Unix it scans the filesystem, in case the filesystem is case-preserving (like the default fs on OS X).

It is not entirely clear to me what this function is meant to actually do? Should it:

  1. Return the case of a filename in some canonical form which depends on the file system?
  2. Return the case of a filename as it is actually stored on disk?
  3. Something else?

and just for completeness:

  1. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?

These are not the same, either conceptually or in practice.

If you want #4, you already have it in os.path.normcase.

I think that the OP, Chris, wants #1, but it isn't entirely clear to me. It's possible that he wants #2.

Various people have posted links to recipes that solve case #2. Note though that this necessarily demands that if the file doesn't exist, it should raise an exception.

In the case of #1, if the file system doesn't exist, we can't predict what the canonical form should be.

The very concept of canonical form for file names is troublesome. If the file system is case-preserving, the file system doesn't define a canonical form: the case of the file name will depend on how the file is initially named. If the file system is case-destructive the behaviour will depend on the file system itself: e.g. FAT12 and ISO 9660 both uppercase file names, but other file systems may make other choices. For some arbitrary path, where we don't know what file system it is, or if the path doesn't actually exist, we have no way of telling what the file system's canonical form will be, or even whether it will have one.

Note that I've been talking about case preservation, not case sensitivity. That's because case preservation is orthogonal to sensitivity. You can see three of the four combinations, e.g.:

Preserving + insensitive: fat32, NTFS under Win32, normally HFS+ Preserving + sensitive: ext3, NTFS under POSIX, optionally HFS+ Destructive + insensitive: fat12, fat16 without long file name support

To the best of my knowledge, destructive + sensitive doesn't exist. It could, in principle, but it would be silly to do so.

Note that just knowing the file system type is not enough to tell what its behaviour will be. Given an arbitrary file system, there's no obvious way to determine what it will do to file names short of trying to create a file and see what happens.

-- Steven D'Aprano



More information about the Python-Dev mailing list