Issue 25996: Add support of file descriptor in os.scandir() (original) (raw)

Created on 2016-01-02 18:10 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
os-scandir-fd.patch serhiy.storchaka,2016-11-06 21:53 review
os-scandir-fd-2.patch serhiy.storchaka,2016-11-12 17:28 review
os-scandir-fd-3.patch serhiy.storchaka,2016-11-20 06:35 review
Pull Requests
URL Status Linked Edit
PR 502 merged serhiy.storchaka,2017-03-06 09:08
Messages (11)
msg257353 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-02 18:10
For now os.scandir() on Unix is implemented using opendir()/readdir()/closedir(). It accepts bytes and str pathname. But most functions in the os module that accept a pathname, accept also an open file descriptor. It is possible to implement this feature in scandir() with using fdopendir() instead of opendir(). This would allow to add a support of the dir_fd parameter in scandir(). And that would allow to implement os.fwalk() with scandir() and make more efficient implementation of os.walk() (because we no longer need to walk long path for deep directories, see ).
msg257380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-02 22:38
Yeah, it was discussed when the PEP 471 was designed, but it was already hard to design os.scandir() without supporting fd as os.scandir() parameter. It's more complex because we have to handle the lifetime of the file descriptor especially if it's exposed in a public attribute.
msg257382 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-02 22:42
Supporting file descriptor was also discussed when pathlib.Path was designed, but there was similar questions on the lifetime of the file descriptor. (Who is able to close it? When? Is it ok to close it using os.close? etc.)
msg280177 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-06 21:53
Proposed patch adds support for file descriptors in os.scandir() and implements os.fwalk() with os.scandir(). The effect of using os.scandir() in os.fwalk(): $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib"))' 1 loop, best of 5: 934 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib", topdown=False))' 1 loop, best of 5: 718 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib"))' Unpatched: 1 loops, best of 5: 1.78 sec per loop Patched: 1 loop, best of 5: 934 msec per loop $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib", topdown=False))' Unpatched: 1 loops, best of 5: 1.76 sec per loop Patched: 1 loop, best of 5: 947 msec per loop
msg280663 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-12 17:28
Thank you for the review Josh. Updated patch addresses your comments and adds yet few microoptimizations.
msg281251 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-20 06:35
Resolved conflicts in the documentation.
msg289079 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-06 09:10
I'm wondering is it possible to implement this feature on Windows?
msg289080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-06 09:21
> I'm wondering is it possible to implement this feature on Windows? On Windows, scandir() is implemented with FindFirstFile() which takes strings. This function creates a handle which should then be passed to FindNextFile(). There is no similar function taking a directory handle, so it's not possible to implement os.scandir(fd) on Windows. It seems like the gnulib emulates fdopendir() on Windows, and its documentation contains warnings: https://www.gnu.org/software/gnulib/manual/html_node/fdopendir.html "But the replacement function is not safe to be used in libraries and is not multithread-safe. Also, the replacement does not guarantee that ‘dirfd(fdopendir(n))==n’ (dirfd might fail, or return a different file descriptor than n)."
msg289153 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-07 06:26
> There is no similar function taking a directory handle In 3.5+ the CRT has O_OBTAIN_DIR (0x2000) for opening a directory, i.e. to call CreateFile with backup semantics. A directory can be read via GetFileInformationByHandleEx [1] using the information classes FileIdBothDirectoryRestartInfo and FileIdBothDirectoryInfo. This info class is just a simplified wrapper around the more powerful system call NtQueryDirectoryFile [2]. The implementation details could be hidden behind _Py_opendir, _Py_fdopendir, _Py_readdir, and _Py_closedir -- allowing a common implementation of the high-level listdir() and scandir() functions. I wrote a ctypes prototype of listdir() along these lines. One feature that's lost in using GetFileInformationByHandleEx to list a directory is the ability to do wildcard filtering. However, Python listdir and scandir never uses wildcard filtering, so it's no real loss. FindFirstFile implements this feature via the FileName parameter of NtQueryDirectoryFile. First it translates DOS wildcards to NT's set of 5 wildcards. There's the native NT '*' and '?', plus the quirky semantics of MS-DOS via '<', '>', and '"', i.e. DOS_STAR, DOS_QM, and DOS_DOT. See FsRtlIsNameInExpression [3] for a description of these wildcard characters. [1]: https://msdn.microsoft.com/en-us/library/aa364953 [2]: https://msdn.microsoft.com/en-us/library/ff567047 [3]: https://msdn.microsoft.com/en-us/library/ff546850
msg289485 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 08:15
Thank you for your investigation Eryk. Helpful as always. Since I have no access to Windows I left this feature Unix-only.
msg290820 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 06:12
New changeset ea720fe7e99d68924deab38de955fe97f87e2b29 by Serhiy Storchaka in branch 'master': bpo-25996: Added support of file descriptors in os.scandir() on Unix. (#502) https://github.com/python/cpython/commit/ea720fe7e99d68924deab38de955fe97f87e2b29
History
Date User Action Args
2022-04-11 14:58:25 admin set github: 70184
2017-03-30 06:21:37 serhiy.storchaka set status: open -> closedresolution: fixedstage: patch review -> resolved
2017-03-30 06:12:33 serhiy.storchaka set messages: +
2017-03-12 08:15:07 serhiy.storchaka set messages: +
2017-03-07 06:26:47 eryksun set nosy: + eryksunmessages: +
2017-03-06 09:21:26 vstinner set messages: +
2017-03-06 09:10:38 serhiy.storchaka set messages: +
2017-03-06 09:08:38 serhiy.storchaka set pull_requests: + <pull%5Frequest410>
2016-11-20 06:35:46 serhiy.storchaka set files: + os-scandir-fd-3.patchmessages: +
2016-11-12 17:28:36 serhiy.storchaka set files: + os-scandir-fd-2.patchmessages: +
2016-11-06 21:53:54 serhiy.storchaka set files: + os-scandir-fd.patchversions: + Python 3.7, - Python 3.6messages: + keywords: + patchstage: patch review
2016-11-02 08:34:13 serhiy.storchaka set assignee: serhiy.storchakadependencies: + Convert os.scandir to Argument Clinic
2016-10-31 23:53:40 serhiy.storchaka link issue28564 dependencies
2016-05-22 18:01:13 abacabadabacaba set nosy: + abacabadabacaba
2016-01-02 22:42:01 vstinner set messages: +
2016-01-02 22:38:59 vstinner set messages: +
2016-01-02 18:10:38 serhiy.storchaka create