[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info (original) (raw)
Ben Hoyt benhoyt at gmail.com
Mon May 13 00:04:11 CEST 2013
- Previous message: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info
- Next message: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
And if we're creating a custom object instead, why return a 2-tuple rather than making the entry's name an attribute of the custom object?
To me, that suggests a more reasonable API for os.scandir() might be for it to be an iterator over "direntry" objects: name (as a string) isfile() isdir() islink() stat() cachedstat (None or a stat object)
Nice! I really like your basic idea of returning a custom object instead of a 2-tuple. And I agree with Christian that .stat() would be clearer called .lstat(). I also like your later idea of simply exposing .dirent (would be None on Windows).
One tweak I'd suggest is that is_file() etc be called isfile() etc without the underscore, to match the naming of the os.path.is* functions.
That would actually make sense at an implementation level anyway - isfile() etc would check self.cachedlstat first, and if that was None they would check self.dirent, and if that was also None they would raise an error.
Hmm, I'm not sure about this at all. Are you suggesting that the DirEntry object's is* functions would raise an error if both cached_lstat and dirent were None? Wouldn't it make for a much simpler API to just call os.lstat() and populate cached_lstat instead? As far as I'm concerned, that'd be the point of making DirEntry.lstat() a function.
In fact, I don't think .cached_lstat should be exposed to the user. They just call entry.lstat(), and it returns a cached stat or calls os.lstat() to get the real stat if required (and populates the internal cached stat value). And the entry.is* functions would call entry.lstat() if dirent was or d_type was DT_UNKNOWN. This would change relatively nasty code like this:
files = [] dirs = [] for entry in os.scandir(path): try: isdir = entry.isdir() except NotPresentError: st = os.lstat(os.path.join(path, entry.name)) isdir = stat.S_ISDIR(st) if isdir: dirs.append(entry.name) else: files.append(entry.name)
Into nice clean code like this:
files = [] dirs = [] for entry in os.scandir(path): if entry.isfile(): dirs.append(entry.name) else: files.append(entry.name)
This change would make scandir() usable by ordinary mortals, rather than just hardcore library implementors.
In other words, I'm proposing that the DirEntry objects yielded by scandir() would have .name and .dirent attributes, and .isdir(), .isfile(), .islink(), .lstat() methods, and look basically like this (though presumably implemented in C):
class DirEntry: def init(self, name, dirent, lstat, path='.'): # User shouldn't need to call this, but called internally by scandir() self.name = name self.dirent = dirent self._lstat = lstat # non-public attributes self._path = path
def lstat(self):
if self._lstat is None:
self._lstat = os.lstat(os.path.join(self._path, self.name))
return self._lstat
def isdir(self):
if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
return self.dirent.d_type == DT_DIR
else:
return stat.S_ISDIR(self.lstat().st_mode)
def isfile(self):
if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
return self.dirent.d_type == DT_REG
else:
return stat.S_ISREG(self.lstat().st_mode)
def islink(self):
if self.dirent is not None and self.dirent.d_type != DT_UNKNOWN:
return self.dirent.d_type == DT_LNK
else:
return stat.S_ISLNK(self.lstat().st_mode)
Oh, and the .dirent would either be None (Windows) or would have .d_type and .d_ino attributes (Linux, OS X).
This would make the scandir() API nice and simple to use for callers, but still expose all the information the OS provides (both the meaningful fields in dirent, and a full stat on Windows, nicely cached in the DirEntry object).
Thoughts?
-Ben
- Previous message: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info
- Next message: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]