gh-68320, gh-88302 - Allow for pathlib.Path
subclassing by barneygale · Pull Request #31691 · python/cpython (original) (raw)
@barneygale Apologies, (1) I was trying to simplify my sample code and didn't copy/paste it properly. (2) No "vinegar" intended, just trying to be succinct. (In fact I intentionally noted that "I love the idea of pathlib. I use it as often as I possibly can.")
If you have any specific criticisms or questions you have about pathlib's design I'll do my best to address them.
Yes, I can be more specific: pathlib's __new__
/__init__
behavior is unexpected and still "breaks" subclassing. My example code very clearly demonstrates that, and I'll include it in its entirety below.
(I don't necessarily think the Path subclassing needs to be fixed. It's be great if it did actually get fixed, but at least a prominent disclaimer of "not intended to be subclassed" at the top of the module's doc page would be appropriate.)
First, though:
generally you don't need to worry about slots unless performance is a concern
In my real-world use case, I am dealing with tens of thousands of Paths, so performance is a concern for me.
But even if it weren't, it doesn't change that the pathlib behavior is unexpected. Any state that is managed in a Path subclass's __new__
or __init__
gets lost, as you will see below.
To be more clear, I'll even remove my use of __slots__
for the time being - the example will still fail.
Now back to the proposed fix (which doesn't work). Actually run this on 3.12.0a6:
from pathlib import Path
class ExtPath(Path):
def __init__(self, *args):
super().__init__(*args)
self.value = 'this will disappear'
if __name__ == '__main__':
extpath = ExtPath('Spam/eggs.txt')
assert extpath.value == 'this will disappear'
assert type(extpath.parent) is ExtPath
assert extpath.parent.value == 'this will disappear'
You'll get TypeError: object.__init__() takes exactly one argument (the instance to initialize)
. This is because pathlib's __new__
impls are not consistent w/r/t __init__
. (Neither Path nor PurePath even define __init__
, so it's object.__init__
that would receive the proposed "fix" - which is why it doesn't work.)
Even if the super call is corrected, it still produces the unexpected result from my initial posting (granted I should've just posted the full example in the first place):
from pathlib import Path
class ExtPath(Path):
def __init__(self, *args):
super().__init__() # object.__init__
self.value = 'this will disappear'
if __name__ == '__main__':
extpath = ExtPath('Spam/eggs.txt')
assert extpath.value == 'this will disappear'
assert type(extpath.parent) is ExtPath
assert extpath.parent.value == 'this will disappear'
Running this results in an unexpected AttributeError:
$ python extpath.py
Traceback (most recent call last):
File "/tmp/extpath.py", line 13, in <module>
assert extpath.parent.value == 'this will disappear'
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ExtPath' object has no attribute 'value'
Why does this happen?
Take a close look at 3.12.0a6's pathlib.py, lines 302-317. Here you will find the PurePath._from_parts and PurePath._from_parsed_parts methods.
These methods are used, from what I gather inspecting the code, to offer performance improvements by avoiding the need to re-parse the path segments. But they do so by bypassing __new__
and __init__
altogether!
Note the use of object.__new__(cls)
in these methods to create the objects. As a result, neither PurePath.__new__
nor Path.__new__
are invoked (same for __init__
s, even if they were defined).
This means that any property or method of (a subclass of) Path that returns a new Path (like .parent
in my example) will return what seems to be an object of the correct type (note my second assert), but any state that that subclass manages, either in its own __new__
or __init__
, is lost; hence the AttributeError.