It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects.

That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour.

">

(original) (raw)


On 25 Nov 2013 09:07, "Ben Hoyt" <benhoyt@gmail.com> wrote:
\>
\> > Right now, pathlib doesn't cache. Guido decided it was safer to start
\> > off like that, and perhaps later we can add some optional caching.
\> >
\> > One reason caching didn't go in is that it's not clear which API is
\> > best. Working on pluggin scandir() into pathlib would actually help
\> > choosing a stat-caching API.
\> >
\> > (or, rather, lstat-caching...)
\> >
\> >> The other related thing is that DirEntry only provides .lstat(),
\> >> because it's providing stat-like info without following links.
\> >
\> > Path.is\_dir() and friends use stat(), i.e. they inform you about
\> > whether a symlink's target is a directory (not the symlink itself). �Of
\> > course, if the DirEntry says the path is a symlink, Path.is\_dir() could
\> > then run stat() to find out about the target.
\> >
\> > Do you plan to propose scandir() for inclusion in the stdlib?
\>
\> Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry
\> objects" for inclusion into the stdlib, and also speed up os.walk() as
\> a result.
\>
\> However, pathlib's API with .is\_dir() and .lstat() etc are so close to
\> DirEntry, I'd be much keener to roll up the scandir functionality into
\> pathlib's iterdir(), as that's already going in the standard library,
\> and iterdir() already returns Path objects.
\>
\> I'm just not sure it's possible or useful without stat caching.
\>
\> We could do Path.lstat(cached=True), but we'd also really want
\> is\_dir(cached=True), so that API kinda sucks. Alternatively you could
\> have iterdir(cached=True) return PathWithCachedStat style objects --
\> probably better, but kinda messy.
\>
\> For these reasons, I would much prefer stat caching on by default in
\> Path -- in my experience, the cached behaviour is desired much much
\> more often than the non-cached. I've written directory walkers more
\> often than I can count, whereas I've maybe only once written a
\> long-running process that needs to re-stat, and if it's clearly
\> documented as cached, then it's super easy to call restat(), or create
\> a new Path instance to get new stat info.
\>
\> This would allow iterdir() to take advantage of the huge performance
\> improvements you can get when walking directories.
\>
\> Guido, are you at all open to reconsidering the uncached-by-default in
\> light of this?

No, caching on the object is dangerously unintuitive - it means two Path objects can compare equal, but give different answers for stat-dependent queries.

A global string (or Path) keyed cache (rather than a per-object cache) would actually be a safer option, since it would ensure distinct path objects always gave the same answer. That's the approach I will likely pursue at some point in walkdir.

It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects.

That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour.

Cheers,
Nick.

>
\> -Ben
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> Python-Dev mailing list
\> Python-Dev@python.org
\> https://mail.python.org/mailman/listinfo/python-dev
\> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com