msg363689 - (view) |
Author: Barney Gale (barneygale) * |
Date: 2020-03-09 00:17 |
`pathlib.Path.iterdir()` uses `os.listdir()` rather than `os.scandir()`. I think this has a small performance cost, per PEP 471: > It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately. As `scandir()` is already available from `_NormalAccessor` it's a simple patch to use `scandir()` instead. |
|
|
msg363721 - (view) |
Author: Karthikeyan Singaravelan (xtreak) *  |
Date: 2020-03-09 12:03 |
This optimisation was also hinted at https://bugs.python.org/issue26032#msg257653 |
|
|
msg363722 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2020-03-09 12:32 |
It is not so easy. There was reason why it was not done earlier. scandir() wastes more limited resource than memory -- file descriptors. It should also be properly closed and do not depend on the garbage collector. Consider the example: def traverse(path, visit): for child in path.iterdir(): if child.is_dir(): traverse(path, visit) else: visit(child) With your optimization it may fail with OSError: Too many open files. |
|
|
msg363734 - (view) |
Author: Barney Gale (barneygale) * |
Date: 2020-03-09 14:30 |
Ah, right you are! The globbing helpers call `list(os.scandir(...))` - perhaps we should do the same here? |
|
|
msg363736 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2020-03-09 14:46 |
It would be slower and less reliable implementation of os.listdir(). |
|
|
msg363741 - (view) |
Author: Barney Gale (barneygale) * |
Date: 2020-03-09 14:56 |
Less reliable how? Doesn't appear any slower: barney.gale@heilbron:~$ python3 -m timeit -s "import os; os.listdir('/usr/local')" 100000000 loops, best of 3: 0.0108 usec per loop barney.gale@heilbron:~$ python3 -m timeit -s "import os; list(os.scandir('/usr/local'))" 100000000 loops, best of 3: 0.00919 usec per loop |
|
|
msg363742 - (view) |
Author: Rémi Lapeyre (remi.lapeyre) * |
Date: 2020-03-09 15:19 |
This is not how timeit works, you just measured the time taken by an empty loop, you can look at `python3 -m timeit -h` to get help how to call it. I think a correct invocation would be: (venv) ➜ ~ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/local'))" 10000 loops, best of 5: 24.3 usec per loop (venv) ➜ ~ python3 -m timeit -s 'from os import listdir' "listdir('/usr/local')" 10000 loops, best of 5: 22.2 usec per loop so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns. |
|
|
msg363753 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2020-03-09 17:07 |
> Less reliable how? See . > so it looks like scandir as a small overhead when accumulating all results and not using the extra info it returns. Try with larger directories. The difference may be not so small. $ python3 -m timeit -s 'from os import scandir' "list(scandir('/usr/include'))" 10000 loops, best of 3: 176 usec per loop $ python3 -m timeit -s 'from os import listdir' "listdir('/usr/include')" 10000 loops, best of 3: 114 usec per loop |
|
|
msg363763 - (view) |
Author: Barney Gale (barneygale) * |
Date: 2020-03-09 19:21 |
Thanks Rémi and Serhiy! Closing this ticket as the patch doesn't provide any sort of improvement. |
|
|