bpo-46227: Add pathlib.Path.walk method by zmievsa · Pull Request #30340 · python/cpython (original) (raw)

Thanks. I didn't know you could edit the list of directories to prevent recursion - neat!

One more thing to consider: in part for performance reasons, pathlib internally represents a path as a drive, root, and list of string parts. This avoids some costly splitting and joining on directory separators -- for example, Path.parent doesn't parse anything or create any new string objects, whereas os.path.dirname() does.

Pathlib has historically re-implemented algorithms like os.path.realpath(), os.path.expanduser() etc, and modified them to work with the (drive, root, parts) representation. The idea that this is faster than round-tripping via strings (and hence re-parsing / re-normalising).

In the last couple releases I've removed pathlib's own realpath() and expanduser() implementations, and made it call through to os.path instead. This solved a couple minor bugs and removed a bit of duplication in the CPython codebase, but might have cost some performance. More recently I've been wondering where to draw the line on that sort of thing.

In that case of your patch, the key line is root_path = Path(root). By using the main Path initialiser you're asking pathlib to re-parse and re-normalise a path. Compare this to the Path.glob() implementation, which does everything it can to avoid re-parsing paths.

I suspect this is all fine, and that practicality beats purity here.