bpo-36305: Fixes to path handling and parsing in pathlib by kmaork · Pull Request #12361 · python/cpython (original) (raw)

The bugs
This PR fixes three bugs with path parsing in pathlib.
The following examples show these bugs (when the cwd is C:\d):

  1. WindowsPath('C:a').absolute() should return WindowsPath('C:\\d\\a') but returns WindowsPath('C:a').
    This is caused by flawed logic in the parse_parts method of the _Flavour class.
  2. WindowsPath('./b:a').absolute() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
    This is caused by the limited interface of parse_parts, and affects the Path.absolute, Path.expanduser and Path.__rtruediv__ methods.
  3. WindowsPath('./b:a').resolve() should return WindowsPath('C:\\d\\b:a') but returns WindowsPath('b:a').
    This is caused by missing logic in the resolve method and in Path.__str__

The fixes

  1. To fix the first bug, I fixed a flaw in the parse_parts method.
  2. The second one was more complicated, as with the current interface of parse_parts (called by _parse_args), the bug can't be fixed. Let's take a simple example: WindowsPath(WindowsPath('./a:b')) and WindowsPath('a:b') - before the bugfix, they are equal. That happens because in both cases, parse_parts is called with ['a:b']. This part can be interpreted in two ways - either as the relative path 'b' with the drive 'a:', or as a file 'a' with the NTFS data-stream 'b'.
    That means that in some cases, passing the flattened _parts of a path to parse_parts is lossy. Therefore we have to a modify parse_parts's interface to enable passing more detailed information about the given parts. What I decided to do was allow passing tuples in addition to strings, thus supporting the old interface. The tuples would contain the drive, root and path parts, enough information to determine the correct parsing of path parts with data-streams, and maybe more future edge cases.
    After modifying parse_parts's interface, I changed _parse_args to use it and made Path.absolute, Path.expanduser and Path.__rtruediv__ pass Path objects to _parse_args instead of path parts, to preserve the path information.
  3. To solve the third bug I had to make small changes to both the resolve method and to Path.__str__.

Notes
In addition to the changes in the code, I've added regression tests and modified old incorrect tests.
Details about drive-relative paths can be found here.

https://bugs.python.org/issue36305