Pathlib and os.path: feature parity and code de-duplication (original) (raw)
Hi all,
A quick rundown of some notable feature differences and duplications between pathlib and os.path, also showing changes in the last year or so.
Path.expanduser() and os.path.expanduser() [complete!]
These implementations were almost identical, save for some subtleties in how Windows home directories are guessed.
Addressed in PR 18841, which deleted pathlib’s implementation and made it call os.path.expanduser()
.
Path.resolve() and os.path.realpath() [complete!]
Only resolve()
was capable of throwing exceptions when missing files or symlink loops were encountered, whereas realpath()
always appended the remaining path segment and returned without indicating an error.
This was addressed in PR 25264, which added a strict
argument to realpath()
, deleted pathlib’s own implementation and made it call realpath()
PurePath() and os.path.normpath() [pr available!]
PurePath
automatically applies safe normalization to paths, e.g. redundant separators and .
entries are removed. It does not collapse ..
entries, as doing so cannot be done safely unless we also resolve symlinks along the way, which requires filesystem access.
Path
objects provide a resolve()
method that will safely resolve symlinks and ..
entries simultaneously.
On the other hand, os.path.normpath()
always naively collapses ..
entries, which can change the meaning of paths involving symlinks. There’s no equivalent to PurePath
’s normalization.
I’ve opened PR 26694 to add a strict
argument to normpath()
.
PurePath.is_reserved() and os.path.??? [todo!]
There’s no equivalent to pathlib’s PurePath.is_reserved()
in os.path
. For full parity this should be added.
PurePath.as_uri() and os.path.??? [todo!]
There’s no equivalent to pathlib’s PurePath.as_uri()
in os.path
. For full parity this should be added.
… and I think that’s everything!.
With these changes in place, pathlib’s _Flavour
abstraction is entirely vestigial and can be safely removed. By moving the OS-specific bits into the low-level ntpath
+ posixpath
modules, we free pathlib from the burden of re-implementing OS path quirks. That in turn allows for some careful refactoring as proposed by @kfollstad here:
Any feedback/questions/concerns very welcome! Thanks for reading.
Cheers
barneygale (Barney Gale) June 12, 2021, 9:03pm 3
The functionality you’re proposing exists in neither pathlib nor os.path presently, and so doesn’t seem relevant to this thread.
uranusjr (Tzu-ping Chung) June 12, 2021, 9:21pm 4
There is urllib.request.pathname2url
which needs to be taken into consideration if you intend to unify this.
But one fundamental question I have is why os.path
needs feature parity with pathlib
in the first place. It makes perfect sense the other way aroud, Path
objects are preferred over string-form paths. It’s difficult to justify adding brand-new things to os.path
; you couldn’t do it previousy, so this is obviously new code. Why don’t you just use pathlib
instead?
barneygale (Barney Gale) June 12, 2021, 9:51pm 5
There is
urllib.request.pathname2url
which needs to be taken into consideration if you intend to unify this.
This looks perfect, thanks for the tip. I’ll play around with using it in pathlib. That obviates the need for a os.path.uri()
-like function.
But one fundamental question I have is why
os.path
needs feature parity withpathlib
in the first place.
Ultimately, it’s to solve bpo-24132, i.e. support subclassing pathlib.Path.
The pathlib internals are a bit of a mess which greatly constraints work on that bug. A lot of things we want to say about the abstractions aren’t quite true, e.g.:
- All OS access happens via _Accessor
- All syntax manip happens in _Flavour
One of these is “all OS-specific functionality is implemented in posixpath or ntpath”. That statement is currently 95% true, e.g. all the OS-specific resolve()
stuff is delegated to realpath()
. The stragglers are the focus of this thread.
By slimming the PurePosixPath
and PureWindowsPath
classes down to almost nothing, and removing _Flavour
, we make the sort of refactors @kfollstad has proposed feasible without breaking backwards compat.
As a secondary reason, I don’t think the existence of pathlib should mean we stop work on os.path. It’s not deprecated. The overlap of functionality between the modules is >90%, and to my mind the key difference is in approach: OOP or procedural. In that framework parity makes sense, because str
vs Path
is only a user choice.
Julien00859 (Julien Castiaux) June 21, 2021, 8:53am 6
Hello there,
In our code base we have the following function:
def _normalize(self, path):
if not path:
return ''
return realpath(abspath(expanduser(expandvars(path.strip()))))
I would like to convert it using pathlib but it seems os.path.expandvars
is missing from pathlib. Maybe I’m wrong and it exists under another name, a quick glance at the doc I couldn’t find anything “var”-related.
Another question that is a bit out of scope, I’m copy-pasting that function in various projects and use it as a complete “path sanitizer function”, I would love to see a pathlib equivalent so that I could just do “Path(user_provided_path).sanitize()”. The topic have been brought to the forum here Have a `.realpath` classmethod in pathlib.Path - #19 by sinoroc and I would like to know the decisions about it ?
pf_moore (Paul Moore) June 21, 2021, 10:04am 7
The functionality in os.path.expandvars
is unrelated to path handling (it can be used on arbitrary strings) so I don’t see any justification for adding it to pathlib. If it weren’t for the backward compatibility implications, I’d suggest renaming it as os.expandvars
, but the disruption would be far too great to make that worth it.
Julien00859 (Julien Castiaux) June 21, 2021, 11:15am 8
It makes sense, better leave it that way then