[Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python) (original) (raw)
Guido van Rossum guido at python.org
Fri Feb 13 18:31:44 CET 2015
- Previous message: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)
- Next message: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I vote for the C implementation.
On Fri, Feb 13, 2015 at 2:07 AM, Victor Stinner <victor.stinner at gmail.com> wrote:
Hi,
TL,DR: are you ok to add 800 lines of C code for os.scandir(), 4x faster than os.listdir() when the file type is checked? I accepted the PEP 471 (os.scandir) a few months ago, but it is not implement yet in Python 3.5, because I didn't make a choice on the implementation. Ben Hoyt wrote different implementations: - full C: os.scandir() and DirEntry are written in C (no change on os.py) - C+Python: os.scandir() (wrapper for opendir/readdir and FindFirstFileW/FindNextFileW) in C, DirEntry in Python - ctypes: os.scandir() and DirEntry fully implemented in Python I'm not interested by the ctypes implementation. It's useful for a third party project hosted at PyPI, but for CPython I prefer to wrap C functions using C code.
In short, the C implementation is faster than the C+Python implementation. The issue #22524 (*) is full of benchmark numbers. IMO the most interesting benchmark is to compare os.listdir() + os.stat() versus os.scandir() + Direntry.isdir(). Let me try to summarize results of this benchmark: * C implementation: scandir is at least 3.5x faster than listdir, up to 44.6x faster on Windows * C+Python implementation: scandir is not really faster than listdir, between 1.3x and 1.4x faster (*) http://bugs.python.org/issue22524 Ben Hoyt reminded me that os.scandir() (PEP 471) doesn't add any new feature: pathlib already provides a nice API on top of os and os.path modules. (You may even notice that DirEntry a much fewer methods ;-)) The main (only?) purpose of the PEP is performance. If os.scandir() is "only" 1.4x faster, I don't think that it is interesting to use os.scandir() in an application. I guess that all applications/libraries will want to keep compatibility with Python 3.4 and older and so will anyway have to duplicate the code to use os.listdir() + os.stat(). So is it worth to duplicate code for such small speedup? Now I see 3 choices: - take the full C implementation, because it's much faster (at least 3.4x faster!) - reject the whole PEP 471 (not nice), because it adds too much code for a minor speedup (not true on Windows: up to 44x faster!) - take the C+Python implementation, because maintenance matters more than performances (only 1.3x faster, sorry) => IMO the best option is to take the C implementation. What do you think? I'm concerned by the length of the C code: the full C implementations adds ~800 lines of C code to posixmodule.c. This file is already the longest C file in CPython. I don't want to make it longer, but I'm not motived to start to split it. Last time I proposed to split a file (unicodeobject.c), some developers complained that it makes search harder. I don't understand this, there are so many tools to navigate in C code. But it was enough for me to give up on this idea. A alternative is to add a new scandir.c module to host the new C code, and share some code with posixmodule.c: remove "static" keyword from required C functions (functions to convert Windows attributes to a os.statresult object). That's a reasonable choice. What do you think? FYI I ran the benchmark on different hardware (SSD, HDD, tmpfs), file systems (ext4, tmpfs, NFS/ext4), operating systems (Linux, Windows). Victor
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20150213/b5f51269/attachment.html>
- Previous message: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)
- Next message: [Python-Dev] PEP 471 (scandir): Poll to choose the implementation (full C or C+Python)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]