[Python-Dev] Updates to PEP 471, the os.scandir() proposal (original) (raw)

Ethan Furman ethan at stoneleaf.us
Wed Jul 9 03:31:55 CEST 2014


On 07/08/2014 06:08 PM, Ben Hoyt wrote:

Just like an attribute does not imply a system call, having a method named 'isdir' /does/ imply a system call, and not having one can be just as misleading. Why does a method imply a system call? os.path.join() and str.lower() don't make system calls. Isn't it just a matter of clear documentation? Anyway -- less philosophical discussion below.

In this case because the names are exactly the same as the os versions which /do/ make a system call.

I presume you're suggesting that isdir/isfile/issymlink should be regular attributes, and accessing them should never do a system call. But what if the system doesn't support dtype (eg: Solaris) or the dtype value is DTUNKNOWN (can happen on Linux, OS X, BSD)? The options are:

So if I'm finally understanding the root problem here:

The solution:

and the new problem:

Have I got that right?

If so, I still like the attribute idea better (surprise!), we just need to revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of a true/false value, it could have a scale:

After all, the programmer should know up front how much of the extra info will be needed for the work that is trying to be done.

We have a choice before us, a fork in the road. :-) We can choose one of these options for the scandir API:

1) The current PEP 471 approach. This solves the issue with dtype being missing or DTUNKNOWN, it doesn't require onerror, and it's a really tidy API that doesn't explode with AttributeErrors if you write code on Windows (without thinking too hard) and then move to Linux. I think all of these points are important -- the cross-platform one not the least, because we want to make it easy, even trivial, for people to write cross-platform code.

Yes, but we don't want a function that sucks equally on all platforms. ;)

2) Nick Coghlan's model of only fetching the lstat value if ensurelstat=True, and including an onerror callback for error handling when scandir calls lstat internally. However, as described, we'd also need an ensuretype=True option, so that scandir() isn't way slower than listdir() if you actually don't want the isX values and dtype is missing/unknown.

With the multi-level version of 'ensure_lstat' we do not need an extra 'ensure_type'.

For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror:

def get_tree_size(path): total = 0 for entry in os.scandir(path, ensure_lstat=1): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat_result.st_size return total

And if we added the onerror here it would be a line fragment, as opposed to the extra four lines (at least) for the try/except in the first example (which I cut).

Finally:

Thank you for writing scandir, and this PEP. Excellent work.

Oh, and +1 for option 2, slightly modified. :)

-- Ethan



More information about the Python-Dev mailing list