[Python-Dev] Updates to PEP 471, the os.scandir() proposal (original) (raw)
Ethan Furman ethan at stoneleaf.us
Wed Jul 9 03:31:55 CEST 2014
- Previous message: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
- Next message: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 07/08/2014 06:08 PM, Ben Hoyt wrote:
Just like an attribute does not imply a system call, having a method named 'isdir' /does/ imply a system call, and not having one can be just as misleading. Why does a method imply a system call? os.path.join() and str.lower() don't make system calls. Isn't it just a matter of clear documentation? Anyway -- less philosophical discussion below.
In this case because the names are exactly the same as the os versions which /do/ make a system call.
I presume you're suggesting that isdir/isfile/issymlink should be regular attributes, and accessing them should never do a system call. But what if the system doesn't support dtype (eg: Solaris) or the dtype value is DTUNKNOWN (can happen on Linux, OS X, BSD)? The options are:
So if I'm finally understanding the root problem here:
listdir returns a list of strings, one for each filename and one for each directory, and keeps no other O/S supplied info.
os.walk, which uses listdir, then needs to go back to the O/S and refetch the thrown-away information
so it's slow.
The solution:
- have scandir /not/ throw away the O/S supplied info
and the new problem:
- not all O/Ses provide the same (or any) extra info about the directory entries
Have I got that right?
If so, I still like the attribute idea better (surprise!), we just need to revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of a true/false value, it could have a scale:
0 = whatever the O/S gives us
1 = at least the is_dir/is_file (whatever the other normal one is), and if the O/S doesn't give it to us for free than call lstat
2 = we want it all -- call lstat if necessary on this platform
After all, the programmer should know up front how much of the extra info will be needed for the work that is trying to be done.
We have a choice before us, a fork in the road. :-) We can choose one of these options for the scandir API:
1) The current PEP 471 approach. This solves the issue with dtype being missing or DTUNKNOWN, it doesn't require onerror, and it's a really tidy API that doesn't explode with AttributeErrors if you write code on Windows (without thinking too hard) and then move to Linux. I think all of these points are important -- the cross-platform one not the least, because we want to make it easy, even trivial, for people to write cross-platform code.
Yes, but we don't want a function that sucks equally on all platforms. ;)
2) Nick Coghlan's model of only fetching the lstat value if ensurelstat=True, and including an onerror callback for error handling when scandir calls lstat internally. However, as described, we'd also need an ensuretype=True option, so that scandir() isn't way slower than listdir() if you actually don't want the isX values and dtype is missing/unknown.
With the multi-level version of 'ensure_lstat' we do not need an extra 'ensure_type'.
For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror:
def get_tree_size(path): total = 0 for entry in os.scandir(path, ensure_lstat=1): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat_result.st_size return total
And if we added the onerror here it would be a line fragment, as opposed to the extra four lines (at least) for the try/except in the first example (which I cut).
Finally:
Thank you for writing scandir, and this PEP. Excellent work.
Oh, and +1 for option 2, slightly modified. :)
--
Ethan
- Previous message: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
- Next message: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]