Issue 39907: pathlib.Path.iterdir() wastes memory by using os.listdir() rather than os.scandir() (original) (raw)

Created on 2020-03-09 00:17 by barneygale, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18865 closed barneygale,2020-03-09 00:22
Messages (9)
msg363689 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 00:17
`pathlib.Path.iterdir()` uses `os.listdir()` rather than `os.scandir()`. I think this has a small performance cost, per PEP 471: > It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately. As `scandir()` is already available from `_NormalAccessor` it's a simple patch to use `scandir()` instead.
msg363721 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-03-09 12:03
This optimisation was also hinted at https://bugs.python.org/issue26032#msg257653
msg363722 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 12:32
It is not so easy. There was reason why it was not done earlier. scandir() wastes more limited resource than memory -- file descriptors. It should also be properly closed and do not depend on the garbage collector. Consider the example: def traverse(path, visit): for child in path.iterdir(): if child.is_dir(): traverse(path, visit) else: visit(child) With your optimization it may fail with OSError: Too many open files.
msg363734 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 14:30
Ah, right you are! The globbing helpers call `list(os.scandir(...))` - perhaps we should do the same here?
msg363736 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 14:46
It would be slower and less reliable implementation of os.listdir().
msg363741 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 14:56
Less reliable how? Doesn't appear any slower: barney.gale@heilbron:~$ python3 -m timeit -s "import os; os.listdir('/usr/local')" 100000000 loops, best of 3: 0.0108 usec per loop barney.gale@heilbron:~$ python3 -m timeit -s "import os; list(os.scandir('/usr/local'))" 100000000 loops, best of 3: 0.00919 usec per loop
msg363742 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2020-03-09 15:19
This is not how timeit works, you just measured the time taken by an empty loop, you can look at `python3 -m timeit -h` to get help how to call it. I think a correct invocation would be: (venv) ➜ ~ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/local'))" 10000 loops, best of 5: 24.3 usec per loop (venv) ➜ ~ python3 -m timeit -s 'from os import listdir' "listdir('/usr/local')" 10000 loops, best of 5: 22.2 usec per loop so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns.
msg363753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-09 17:07
> Less reliable how? See . > so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns. Try with larger directories. The difference may be not so small. $ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/include'))" 10000 loops, best of 3: 176 usec per loop $ python3 -m timeit -s 'from os import listdir' "listdir('/usr/include')" 10000 loops, best of 3: 114 usec per loop
msg363763 - (view) Author: Barney Gale (barneygale) * Date: 2020-03-09 19:21
Thanks Rémi and Serhiy! Closing this ticket as the patch doesn't provide any sort of improvement.
History
Date User Action Args
2022-04-11 14:59:27 admin set github: 84088
2020-03-09 19:21:58 barneygale set messages: +
2020-03-09 19:17:55 barneygale set status: open -> closedresolution: not a bugstage: patch review -> resolved
2020-03-09 17:07:40 serhiy.storchaka set messages: +
2020-03-09 15:19:36 remi.lapeyre set nosy: + remi.lapeyremessages: +
2020-03-09 14:56:02 barneygale set messages: +
2020-03-09 14:46:13 serhiy.storchaka set messages: +
2020-03-09 14:30:19 barneygale set messages: +
2020-03-09 12:32:54 serhiy.storchaka set messages: +
2020-03-09 12:03:14 xtreak set nosy: + xtreak, serhiy.storchaka, pitroumessages: +
2020-03-09 00:22:05 barneygale set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest18223>
2020-03-09 00:17:57 barneygale create