[Python-Dev] Path object design (original) (raw)
Mike Orr sluggoster at gmail.com
Wed Nov 1 04:14:13 CET 2006
- Previous message: [Python-Dev] PEP: Adding data-type objects to Python
- Next message: [Python-Dev] Path object design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying that the first object-oriented proposal was rejected. I'm in favor of the "directory tuple" approach which wasn't mentioned in the thread. This was proposed by Noal Raphael several months ago: a Path object that's a sequence of components (a la os.path.split) rather than a string. The beauty of this approach is that slicing and joining are expressed naturally using the [] and + operators, eliminating several methods.
Introduction: http://wiki.python.org/moin/AlternativePathClass Feature discussion: http://wiki.python.org/moin/AlternativePathDiscussion Reference implementation: http://wiki.python.org/moin/AlternativePathModule
(There's a link to the introduction at the end of PEP 355.) Right now I'm working on a test suite, then I want to add the features marked "Mike" in the discussion -- in a way that people can compare the feature alternatives in real code -- and write a PEP. But it's a big job for one person, and there are unresolved issues on the discussion page, not to mention things brought up in the "PEP 355 status" thread. We had three people working on the discussion page but development seems to have ground to a halt.
One thing is sure -- we urgently need something better than os.path. It functions well but it makes hard-to-read and unpythonic code. For instance, I have an application that has to add its libraries to the Python path, relative to the executable's location.
/toplevel app1/ bin/ main_progam.py utility1.py init_app.py lib/ app_module.py shared/ lib/ shared_module.py
The solution I've found is an init_app module in every application that sets up the paths. Conceptually it needs "../lib" and "../../shared/lib", but I want the absolute paths without hardcoding them, in a platform-neutral way. With os.path, "../lib" is:
os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
YUK! Compare to PEP 355:
Path(__FILE__).parent.parent.join("lib")
Much easier to read and debug. Under Noam's proposal it would be:
Path(__FILE__)[:-2] + "lib"
I'd also like to see the methods more intelligent: don't raise an error if an operation is already done (e.g., a directory exists or a file is already removed). There's no reason to clutter one's code with extra if's when the methods can easily encapsulate this. This was considered a too radical departure from os.path for some, but I have in mind even more radical convenience methods which I'd put in a third-party subclass if they're not accepted into the standard library, the way 'datetime' has third-party subclasses.
In my application I started using Orendorff's path module, expecting the standard path object would be close to it. When PEP 355 started getting more changes and the directory-based alternative took off, I took path.py out and rewrote my code for os.path until an alternative becomes more stable. Now it looks like it will be several months and possibly several third-party packages until one makes it into the standard library. This is unfortunate. Not only does it mean ugly code in applications, but it means packages can't accept or return Path objects and expect them to be compatible with other packages.
The reasons PEP 355 was rejected also sound strange. Nick Coghlan wrote (Oct 1):
Things the PEP 355 path object lumps together: - string manipulation operations - abstract path manipulation operations (work for non-existent filesystems) - read-only traversal of a concrete filesystem (dir, stat, glob, etc) - addition & removal of files/directories/links within a concrete filesystem
Dumping all of these into a single class is certainly practical from a utility point of view, but it's about as far away from beautiful as you can get, which creates problems from a learnability point of view, and from a capability-based security point of view.
What about the convenience of the users and the beauty of users' code? That's what matters to me. And I consider one class easier to learn. I'm tired of memorizing that 'split' is in os.path while 'remove' and 'stat' are in os. This seems arbitrary: you're statting a path, aren't you? Also, if you have four classes (abstract path, file, directory, symlink), each of those will have 3+ platform-specific versions. Then if you want to make an enhancement subclass you'll have to make 12 of them, one for each of the 3*4 combinations of superclasses. Encapsulation can help with this, but it strays from the two-line convenience for the user:
from path import Path
p = Path("ABC") # Works the same for files/directories on any platform.
Nevertheless, I'm open to seeing a multi-class API, though hopefully less verbose than Talin's preliminary one (Oct 26). Is it necessary to support path.parent(), pathobj.parent(), io.dir.listdir(), and io.dir.Directory(). That's four different namespaces to memorize which function/method is where, and if a function/method belongs to multiple ones it'll be duplicated, and you'll have to remember that some methods are duplicated and others aren't... Plus, constructors like io.dir.Directory() look too verbose. io.Directory() might be acceptable, with the functions as class methods.
I agree that supporting non-filesystem directories (zip files, CSV/Subversion sandboxes, URLs) would be nice, but we already have a big enough project without that. What constraints should a Path object keep in mind in order to be forward-compatible with this?
If anyone has design ideas/concerns about a new Path class(es), please post them. If anyone would like to work on a directory-based spec/implementation, please email me.
-- Mike Orr <sluggoster at gmail.com>
- Previous message: [Python-Dev] PEP: Adding data-type objects to Python
- Next message: [Python-Dev] Path object design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]