[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator (original) (raw)
Ben Hoyt benhoyt at gmail.com
Mon Jun 30 19:05:54 CEST 2014
- Previous message: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
- Next message: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
So, here's my alternative proposal: add an "ensurelstat" flag to scandir() itself, and don't have any methods on DirEntry, only attributes.
That would make the DirEntry attributes: isdir: boolean, always populated isfile: boolean, always populated issymlink boolean, always populated lstatresult: stat result, may be None on POSIX systems if ensurelstat is False (I'm not particularly sold on "lstatresult" as the name, but "lstat" reads as a verb to me, so doesn't sound right as an attribute name) What this would allow: - by default, scanning is efficient everywhere, but lstatresult may be None on POSIX systems - if you always need the lstat result, setting "ensurelstat" will trigger the extra system call implicitly - if you only sometimes need the stat result, you can call os.lstat() explicitly when the DirEntry lstat attribute is None Most importantly, regardless of platform, the cached stat result (if not None) would reflect the state of the entry at the time the directory was scanned, rather than at some arbitrary later point in time when lstat() was first called on the DirEntry object. There'd still be a slight window of discrepancy (since the filesystem state may change between reading the directory entry and making the lstat() call), but this could be effectively eliminated from the perspective of the Python code by making the result of the lstat() call authoritative for the whole DirEntry object.
Yeah, I quite like this. It does make the caching more explicit and consistent. It's slightly annoying that it's less like pathlib.Path now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't matter. The differences in naming may highlight the difference in caching, so maybe it's a good thing.
Two further questions from me:
How does error handling work? Now os.stat() will/may be called during iteration, so in next. But it hard to catch errors because you don't call next explicitly. Is this a problem? How do other iterators that make system calls or raise errors handle this?
There's still the open question in the PEP of whether to include a way to access the full path. This is cheap to build, it has to be built anyway on POSIX systems, and it's quite useful for further operations on the file. I think the best way to handle this is a .fullname or .full_name attribute as suggested elsewhere. Thoughts?
-Ben
- Previous message: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
- Next message: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]