[Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python) (original) (raw)

Ben Hoyt benhoyt at gmail.com
Fri Feb 13 14:35:00 CET 2015


> * C implementation: scandir is at least 3.5x faster than listdir, up > to 44.6x faster on Windows > * C+Python implementation: scandir is not really faster than listdir, > between 1.3x and 1.4x faster

So amusingly, the bottleneck is not so much the cost of system calls, but the cost of Python wrappers around system calls.

Yes, that's basically right. Or put another way, the cost of the extra system calls is dwarfed by the cost of wrapping things in Python.

Victor's given a great summary of the issues at the top of this thread, and I'm definitely for the all-C version -- otherwise we gain a bunch of speed by not calling stat(), but then lose most of it again with the Python wrapping. As Victor noted, the rationale for PEP 471 has always been about performance, and if we don't have much of that (especially on Linux), it's not nearly as worthwhile.

Re maintenance of the C code: yes, the pure C version is about twice as many lines as the half Python version (~800 vs ~400), but I think Nick makes a good point here: "This isn't code I'd expect us to have to change very often, so the maintenance risks associated with the pure C implementation seem low." We have to vet this code thoroughly basically once, now. :-)

If we go ahead with the all C approach, I'd be in favour of refactoring a little and putting the new scandir code into a separate C file. There are two ways to do this: a) sticking with a single Python module and just referencing the non-static functions in scandir.c from posixmodule.c, or b) sharing some functions but making _scandir.c its own importable module. Option (a) is somewhat simpler as there's not module setup stuff twice, but I don't know if there's a precedent for that way of doing things.

-Ben



More information about the Python-Dev mailing list