msg257353 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-01-02 18:10 |
For now os.scandir() on Unix is implemented using opendir()/readdir()/closedir(). It accepts bytes and str pathname. But most functions in the os module that accept a pathname, accept also an open file descriptor. It is possible to implement this feature in scandir() with using fdopendir() instead of opendir(). This would allow to add a support of the dir_fd parameter in scandir(). And that would allow to implement os.fwalk() with scandir() and make more efficient implementation of os.walk() (because we no longer need to walk long path for deep directories, see ). |
|
|
msg257380 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-01-02 22:38 |
Yeah, it was discussed when the PEP 471 was designed, but it was already hard to design os.scandir() without supporting fd as os.scandir() parameter. It's more complex because we have to handle the lifetime of the file descriptor especially if it's exposed in a public attribute. |
|
|
msg257382 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-01-02 22:42 |
Supporting file descriptor was also discussed when pathlib.Path was designed, but there was similar questions on the lifetime of the file descriptor. (Who is able to close it? When? Is it ok to close it using os.close? etc.) |
|
|
msg280177 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-06 21:53 |
Proposed patch adds support for file descriptors in os.scandir() and implements os.fwalk() with os.scandir(). The effect of using os.scandir() in os.fwalk(): $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib"))' 1 loop, best of 5: 934 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib", topdown=False))' 1 loop, best of 5: 718 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib"))' Unpatched: 1 loops, best of 5: 1.78 sec per loop Patched: 1 loop, best of 5: 934 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib", topdown=False))' Unpatched: 1 loops, best of 5: 1.76 sec per loop Patched: 1 loop, best of 5: 947 msec per loop |
|
|
msg280663 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-12 17:28 |
Thank you for the review Josh. Updated patch addresses your comments and adds yet few microoptimizations. |
|
|
msg281251 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-20 06:35 |
Resolved conflicts in the documentation. |
|
|
msg289079 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-06 09:10 |
I'm wondering is it possible to implement this feature on Windows? |
|
|
msg289080 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-03-06 09:21 |
> I'm wondering is it possible to implement this feature on Windows? On Windows, scandir() is implemented with FindFirstFile() which takes strings. This function creates a handle which should then be passed to FindNextFile(). There is no similar function taking a directory handle, so it's not possible to implement os.scandir(fd) on Windows. It seems like the gnulib emulates fdopendir() on Windows, and its documentation contains warnings: https://www.gnu.org/software/gnulib/manual/html_node/fdopendir.html "But the replacement function is not safe to be used in libraries and is not multithread-safe. Also, the replacement does not guarantee that ‘dirfd(fdopendir(n))==n’ (dirfd might fail, or return a different file descriptor than n)." |
|
|
msg289153 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2017-03-07 06:26 |
> There is no similar function taking a directory handle In 3.5+ the CRT has O_OBTAIN_DIR (0x2000) for opening a directory, i.e. to call CreateFile with backup semantics. A directory can be read via GetFileInformationByHandleEx [1] using the information classes FileIdBothDirectoryRestartInfo and FileIdBothDirectoryInfo. This info class is just a simplified wrapper around the more powerful system call NtQueryDirectoryFile [2]. The implementation details could be hidden behind _Py_opendir, _Py_fdopendir, _Py_readdir, and _Py_closedir -- allowing a common implementation of the high-level listdir() and scandir() functions. I wrote a ctypes prototype of listdir() along these lines. One feature that's lost in using GetFileInformationByHandleEx to list a directory is the ability to do wildcard filtering. However, Python listdir and scandir never uses wildcard filtering, so it's no real loss. FindFirstFile implements this feature via the FileName parameter of NtQueryDirectoryFile. First it translates DOS wildcards to NT's set of 5 wildcards. There's the native NT '*' and '?', plus the quirky semantics of MS-DOS via '<', '>', and '"', i.e. DOS_STAR, DOS_QM, and DOS_DOT. See FsRtlIsNameInExpression [3] for a description of these wildcard characters. [1]: https://msdn.microsoft.com/en-us/library/aa364953 [2]: https://msdn.microsoft.com/en-us/library/ff567047 [3]: https://msdn.microsoft.com/en-us/library/ff546850 |
|
|
msg289485 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-12 08:15 |
Thank you for your investigation Eryk. Helpful as always. Since I have no access to Windows I left this feature Unix-only. |
|
|
msg290820 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-30 06:12 |
New changeset ea720fe7e99d68924deab38de955fe97f87e2b29 by Serhiy Storchaka in branch 'master': bpo-25996: Added support of file descriptors in os.scandir() on Unix. (#502) https://github.com/python/cpython/commit/ea720fe7e99d68924deab38de955fe97f87e2b29 |
|
|