[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) (original) (raw)

Victor Stinner victor.stinner at gmail.com
Tue Jul 1 09:44:02 CEST 2014


Hi,

IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API.

To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str).

scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it.

There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It can be an issue for very deep file hierarchy.

If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc.

The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc.

Example:

fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd)

Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!)

Victor



More information about the Python-Dev mailing list