[Python-Dev] My summary of the scandir (PEP 471) (original) (raw)

Paul Moore p.f.moore at gmail.com
Tue Jul 1 23:20:17 CEST 2014


On 1 July 2014 14:00, Ben Hoyt <benhoyt at gmail.com> wrote:

2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensurelstat keyword param to scandir if you need the lstatresult value

I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstatresult being None sometimes (on POSIX), have it None always unless you specify ensurelstat=True. (Actually, call it getlstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstatresult without specifying getlstat=True.

This is getting very complicated (at least to me, as a Windows user, where the basic idea seems straightforward).

It seems to me that the right model is the standard "thin wrapper round the OS feature" that acts as a building block - it's typical of the rest of the os module. I think that thin wrapper is needed - even if the various bells and whistles are useful, they can be built on top of a low-level version (whereas the converse is not the case). Typically, such thin wrappers expose POSIX semantics by default, and Windows behaviour follows as closely as possible (see for example stat, where st_ino makes no sense on Windows, but is present). In this case, we're exposing Windows semantics, and POSIX is the one needing to fit the model, but the principle is the same.

On that basis, optional attributes (as used in stat results) seem entirely sensible.

The documentation for DirEntry could easily be written to parallel that of a stat result:

""" The return value is an object whose attributes correspond to the data the OS returns about a directory entry:

On Windows, the following attributes are also available

That's no harder to understand (or to work with) than the equivalent stat result. The only difference is that the unavailable attributes can be queried on POSIX, there's just a separate system call involved (with implications in terms of performance, error handling and potential race conditions).

The version of scandir with the ensure_lstat argument is easy to write based on one with optional arguments (I'm playing fast and loose with adding attributes to DirEntry values here, just for the sake of an example - the details are left as an exercise)

def scandir_ensure(path='.', ensure_lstat=False): for entry in os.scandir(path): if ensure_lstat and not hasattr(entry, 'st_size'): stat_data = os.lstat(entry.full_name) entry.st_size = stat_data.st_size entry.st_atime = stat_data.st_atime entry.st_mtime = stat_data.st_mtime entry.st_ctime = stat_data.st_ctime # Ignore file_attributes, as we'll never get here on Windows yield entry

Variations on how you handle errors in the lstat call, etc, can be added to taste.

Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value.

Paul



More information about the Python-Dev mailing list