[Python-Dev] Updates to PEP 471, the os.scandir() proposal (original) (raw)

Victor Stinner victor.stinner at gmail.com
Thu Jul 10 02:15:58 CEST 2014


2014-07-09 17:29 GMT+02:00 Ben Hoyt <benhoyt at gmail.com>:

Would this not "break" the tree size script being discussed in the other thread, as it would follow links and include linked directories in the "size" of the tree?

The get_tree_size() function in the PEP would use: "if not entry.is_symlink() and entry.is_dir():".

Note: First I wrote "if entry.is_dir() and not entry.is_symlink():", but this syntax is slower on Linux because is_dir() has to call lstat().

Adding an optional keyword to DirEntry.is_dir() would allow to write "if entry.is_dir(follow_symlink=False)", but it looks like a micro optimization and as I said, I prefer to stick to pathlib.Path API (which was already heavily discussed in its PEP). Anyway, this case is rare (I explain that below), we should not worry too much about it.

Yeah, I agree. Victor -- I don't think the DirEntry isX() methods (or attributes) should mimic the link-following os.path.isdir() at all. You want the type of the entry, not the type of the source.

On UNIX, a symlink to a directory is expected to behave like a directory. For example, in a file browser, you should enter in the linked directory when you click on a symlink to a directory.

There are only a few cases where you want to handle symlinks differently: archive (ex: tar), compute the size of a directory (ex: du does not follow symlinks by default, du -L follows them), remove a directory.

You should do a short poll in the Python stdlib and on the Internet to check what is the most common check.

Examples of the Python stdlib:

In this list of 12 examples, only compileall, shutil.rmtree and os.walk check if entries are symlinks. compileall starts by checking "if not os.path.isdir(fullname):" which follows symlinks. os.walk() starts by checking "if os.path.isdir(name):" which follows symlinks. I consider that only one case on 12 (8.3%) doesn't follow symlinks.

If entry.is_dir() doesn't follow symlinks, the other 91.7% will need to be modified to use "if entry.is_dir() or (entry.is_link() and os.path.is_dir(entry.full_name)):" to keep the same behaviour :-(

Otherwise, as Paul says, you are essentially forced to follow links, and os.walk(followlinks=False), which is the default, can't do the right thing.

os.walk() and get_tree_size() are good users of scandir(), but they are recursive functions. It means that you may handle symlinks differently, os.walk() gives the choice to follow or not symlinks for example.

Recursive functions are rare. The most common case is to list files of a single directory and then filter files depending on various filters (is a file? is a directory? match the file name? ...). In such use case, you don't "care" of symlinks (you want to follow them).

Victor



More information about the Python-Dev mailing list