msg78338 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2008-12-27 04:00 |
os.path.commonprefix returns the common prefix of a list of paths taken character-by-character. This can return invalid paths. For example, os.path.commonprefix(["/export/home/dave", "/etc/passwd"]) will return "/e", which likely has no meaning as a path, at least in the context of the input list. Ideally, os.path.commonprefix would operate component-by-component, but people rely on the existing character-by-character operation, so it has been so far impossible to change semantics. There are several possible ways to solve this problem. One, change how commonprefix behaves. Two, add a flag to commonprefix to allow it to operate component-by-component if desired. Three, add a new function to os.path. I personally prefer the first option. Aside from the semantic change though, it presents the problem of where to put the old definition of commonprefix. It's clearly of some use or people wouldn't have co- opted it for non-filesystem use. It could go in the string module, but that's been living a life in limbo since the creation of string methods. People have been loathe to add new functionality there. The second option seems to me like would just be a hack on top of already broken behavior and probably require the currently slightly broken behavior as the default to boot, so I won't go there. Since option one is perhaps not going to be available to me, I've implemented the third option as a new function, commonpathprefix. See the attached patch. It includes test cases and documentation changes. |
|
|
msg78339 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2008-12-27 04:24 |
A new function sounds like a good solution to me. How about just calling it "os.path.commonpath" though? I agree having a path component based prefix function in os.path is highly desirable, particularly since the addition of relpath in 2.6: base_dir = os.path.commonpath(paths) rel_paths = [os.path.relpath(p, base_dir) for p in paths] |
|
|
msg78529 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-12-30 13:24 |
The documentation should explain what a "common path prefix" is. It can't be the path to a common parent directory, since the new function doesn't allow mixing absolute and relative directories. As Phillip Eby points out, it also doesn't account for case-insensitivity that some file systems or operating systems implement, nor does it take into account short file names on Windows. |
|
|
msg78530 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2008-12-30 13:51 |
I think we need to recognize the inherent limitations of what we can expect to do. It is perfectly reasonable for a user on Windows to import posixpath and call posixpath.commonpathprefix. The function won't have access to the actual filesystems being manipulated. Same for Unix folks importing ntpath and manipulating Windows paths. While we can make it handle case-insensitivity, I'm no sure we can do much, if anything, about shortened filenames. Also, as long as we are considering case sensitivity, what about HFS on Mac OS X? Skip |
|
|
msg78532 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2008-12-30 13:55 |
1. The discussion on python-dev shows that the current documentation of os.path.commonprefix is incorrect - it technically works element by element rather than character by character (since it will handle sequences other than strings, such as lists of path components) 2. Splitting on os.sep is not the correct way to break a string into path components. Instead, os.path.split needs to be applied repeatedly until "head" is a single character (a single occurrence of os.sep or os.altsep for an absolute path) or empty (for a relative path). (Alternatively, but with additional effects on the result, the separators can be normalised first with os.path.normpath or os.path.normcase) For Windows, os.path.splitunc and os.path.splitdrive should also be invoked first, and if either returns a non-empty string, that should become the first path component (with the remaining components filled in as above) 3. Calling any or all of abspath/expanduser/expandvars/normcase/normpath/realpath is the responsibility of the library user as far as os.path.commonprefix is concerned. Should that behaviour be retained for an os.path.commonpath function, or should some of them (such as os.path.abspath) be called automatically? |
|
|
msg78533 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2008-12-30 14:05 |
The regex based approach to the component splitting when os.altsep is defined obviously works as well. Duplicating the values of sep and altsep in the default regex that way grates a little though... |
|
|
msg111589 - (view) |
Author: Craig McQueen (cmcqueen1975) |
Date: 2010-07-26 02:28 |
http://code.activestate.com/recipes/577016-path-entire-split-commonprefix/ |
|
|
msg227699 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-09-27 16:45 |
There is more developed patch in . |
|
|
msg227707 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2014-09-27 18:28 |
Feel free to close this ticket. I long ago gave up on it. |
|
|
msg293143 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2017-05-05 21:53 |
Issue 10395 added “os.path.commonpath” in 3.5. |
|
|