gh-68320, gh-88302 - Allow for pathlib.Path subclassing by barneygale · Pull Request #31691 · python/cpython (original) (raw)

@barneygale Apologies, (1) I was trying to simplify my sample code and didn't copy/paste it properly. (2) No "vinegar" intended, just trying to be succinct. (In fact I intentionally noted that "I love the idea of pathlib. I use it as often as I possibly can.")

If you have any specific criticisms or questions you have about pathlib's design I'll do my best to address them.

Yes, I can be more specific: pathlib's __new__/__init__ behavior is unexpected and still "breaks" subclassing. My example code very clearly demonstrates that, and I'll include it in its entirety below.
(I don't necessarily think the Path subclassing needs to be fixed. It's be great if it did actually get fixed, but at least a prominent disclaimer of "not intended to be subclassed" at the top of the module's doc page would be appropriate.)

First, though:

generally you don't need to worry about slots unless performance is a concern

In my real-world use case, I am dealing with tens of thousands of Paths, so performance is a concern for me.
But even if it weren't, it doesn't change that the pathlib behavior is unexpected. Any state that is managed in a Path subclass's __new__ or __init__ gets lost, as you will see below.
To be more clear, I'll even remove my use of __slots__ for the time being - the example will still fail.

Now back to the proposed fix (which doesn't work). Actually run this on 3.12.0a6:

from pathlib import Path

class ExtPath(Path):

    def __init__(self, *args):
        super().__init__(*args)
        self.value = 'this will disappear'

if __name__ == '__main__':
    extpath = ExtPath('Spam/eggs.txt')
    assert extpath.value == 'this will disappear'
    assert type(extpath.parent) is ExtPath
    assert extpath.parent.value == 'this will disappear'

You'll get TypeError: object.__init__() takes exactly one argument (the instance to initialize). This is because pathlib's __new__ impls are not consistent w/r/t __init__. (Neither Path nor PurePath even define __init__, so it's object.__init__ that would receive the proposed "fix" - which is why it doesn't work.)

Even if the super call is corrected, it still produces the unexpected result from my initial posting (granted I should've just posted the full example in the first place):

from pathlib import Path

class ExtPath(Path):

    def __init__(self, *args):
        super().__init__()  # object.__init__
        self.value = 'this will disappear'

if __name__ == '__main__':
    extpath = ExtPath('Spam/eggs.txt')
    assert extpath.value == 'this will disappear'
    assert type(extpath.parent) is ExtPath
    assert extpath.parent.value == 'this will disappear'

Running this results in an unexpected AttributeError:

$ python extpath.py 
Traceback (most recent call last):
  File "/tmp/extpath.py", line 13, in <module>
    assert extpath.parent.value == 'this will disappear'
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ExtPath' object has no attribute 'value'

Why does this happen?

Take a close look at 3.12.0a6's pathlib.py, lines 302-317. Here you will find the PurePath._from_parts and PurePath._from_parsed_parts methods.

These methods are used, from what I gather inspecting the code, to offer performance improvements by avoiding the need to re-parse the path segments. But they do so by bypassing __new__ and __init__ altogether!

Note the use of object.__new__(cls) in these methods to create the objects. As a result, neither PurePath.__new__ nor Path.__new__ are invoked (same for __init__s, even if they were defined).

This means that any property or method of (a subclass of) Path that returns a new Path (like .parent in my example) will return what seems to be an object of the correct type (note my second assert), but any state that that subclass manages, either in its own __new__ or __init__, is lost; hence the AttributeError.