[Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info) (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Nov 25 00🔞56 CET 2013
- Previous message: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
- Next message: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 25 Nov 2013 09:07, "Ben Hoyt" <benhoyt at gmail.com> wrote:
> Right now, pathlib doesn't cache. Guido decided it was safer to start > off like that, and perhaps later we can add some optional caching. > > One reason caching didn't go in is that it's not clear which API is > best. Working on pluggin scandir() into pathlib would actually help > choosing a stat-caching API. > > (or, rather, lstat-caching...) > >> The other related thing is that DirEntry only provides .lstat(), >> because it's providing stat-like info without following links. > > Path.isdir() and friends use stat(), i.e. they inform you about > whether a symlink's target is a directory (not the symlink itself). Of > course, if the DirEntry says the path is a symlink, Path.isdir() could > then run stat() to find out about the target. > > Do you plan to propose scandir() for inclusion in the stdlib? Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry objects" for inclusion into the stdlib, and also speed up os.walk() as a result. However, pathlib's API with .isdir() and .lstat() etc are so close to DirEntry, I'd be much keener to roll up the scandir functionality into pathlib's iterdir(), as that's already going in the standard library, and iterdir() already returns Path objects. I'm just not sure it's possible or useful without stat caching. We could do Path.lstat(cached=True), but we'd also really want isdir(cached=True), so that API kinda sucks. Alternatively you could have iterdir(cached=True) return PathWithCachedStat style objects -- probably better, but kinda messy. For these reasons, I would much prefer stat caching on by default in Path -- in my experience, the cached behaviour is desired much much more often than the non-cached. I've written directory walkers more often than I can count, whereas I've maybe only once written a long-running process that needs to re-stat, and if it's clearly documented as cached, then it's super easy to call restat(), or create a new Path instance to get new stat info. This would allow iterdir() to take advantage of the huge performance improvements you can get when walking directories. Guido, are you at all open to reconsidering the uncached-by-default in light of this?
No, caching on the object is dangerously unintuitive - it means two Path objects can compare equal, but give different answers for stat-dependent queries.
A global string (or Path) keyed cache (rather than a per-object cache) would actually be a safer option, since it would ensure distinct path objects always gave the same answer. That's the approach I will likely pursue at some point in walkdir.
It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects.
That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour.
Cheers, Nick.
-Ben
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20131125/a19b1693/attachment-0001.html>
- Previous message: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
- Next message: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]