[Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method? (original) (raw)
Victor Stinner victor.stinner at gmail.com
Fri Feb 13 10:46:55 CET 2015
- Previous message: [Python-Dev] subclassing builtin data structures
- Next message: [Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
TL;DR: on POSIX, is it useful to know the inode number (st_ino) without the device number (st_dev)?
While reading feedback on the Python 3.5 alpha 1 release, I saw a comment saying that the current design of os.scandir() (PEP 471) doesn't fit a very specific usecase where the inode number is needed:
"Ah, turns out we needed even more optimizations than that is able to give us; in particular, the underlying system readdir call gives us the inode number, which we need to compare against a cache of hard links, in order to avoid having to stat the underlying files if we've already done so on another hard link. It looks like the DirEntry API used here only includes the path and name, not the inode number, without invoking another stat call, and we needed to optimize out that extra stat call." https://www.reddit.com/r/Python/comments/2synry/so_8_peps_are_currently_being_proposed_for_python/cnvnz1w
Since the C function readdir() provides the inode number (d_ino field of the dirent structure), I propose add a new DirEntry.inode() method.
*** Now the real question: is it useful to know the inode number (st_ino) without the device number (st_dev)? ***
On POSIX, you can still get the st_dev from DirEntry.stat(), but it always require a system call. So you loose the whole purpose of DirEntry (no extra syscall).
I wrote a script script check_stdev.py, attached to this email, to check if all entries of a directory have the same st_dev value than the directory itself:
- same for /usr/bin, /usr/lib, /tmp, /proc, ...
- different for /dev
What about "union" file systems like UnionFS or thinks like "mount -o bind"? Can someone test? Does anyone have some information?
So the answer looks to be: it's useful for all directories except of /dev. Example:
/dev/hugepages st_dev is different: 35 vs 5 /dev/mqueue st_dev is different: 13 vs 5 /dev/pts st_dev is different: 11 vs 5 /dev/shm st_dev is different: 17 vs 5
On POSIX, DirEntry.inode() just exposes the d_ino value from readdir().
On Windows, FirstFindFileW/FindFindFileW returns almost a full stat_result structure, except of st_ino, st_dev and st_nlink fields which are set to 0.
So DirEntry.inode() has to call os.lstat() to read the inode number. The inode number will be cached by DirEntry.inode() in the DirEntry object, but the os.lstat() result is dropped.
On Windows, I don't want to cache the full os.lstat() result from DirEntry.inode() into DirEntry to replace the previous incomplete stat_result from FirstFindFileW/FindFindFileW, because DirEntry.stat() would return a different result (st_ino, st_dev, st_nlink fields set or not) depending if the inode() methode was called or not.
Note: scandir-6.patch of http://bugs.python.org/issue22524 contains an implementation of os.scandir() with DirEntry.inode(), if you want to play.
Victor -------------- next part -------------- A non-text attachment was scrubbed... Name: check_stdev.py Type: text/x-python Size: 300 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20150213/198967c7/attachment.py>
- Previous message: [Python-Dev] subclassing builtin data structures
- Next message: [Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]