msg279745 - (view) |
Author: Marian Beermann (enkore) |
Date: 2016-10-30 20:02 |
The use of os.listdir severely limits the speed of this function on anything except solid-state drives. Using the new-in-Python 3.5 os.scandir should eliminate this bottleneck. |
|
|
msg279801 - (view) |
Author: Josh Rosenberg (josh.r) *  |
Date: 2016-10-31 16:37 |
You need to cache the names up front because the loop is unlinking entries, and readdir isn't consistent when the directory entries are mutated between calls. https://github.com/kripken/emscripten/issues/2528 FindFirstFile/FindNextFile likely has similar issues, even if they're not consistently seen (due to DeleteFile itself not guaranteeing deletion until the last handle to the file is closed). scandir might save some stat calls, but you'd need to convert it from generator to list before the loop begins, which would limit the savings a bit. |
|
|
msg279813 - (view) |
Author: Marian Beermann (enkore) |
Date: 2016-10-31 17:40 |
The main issue on *nix is more likely that by using listdir you get directory order, while what you really need is inode ordering. scandir allows for that, since you get the inode from the DirEntry with no extra syscalls - especially without an open() or stat(). Other optimizations are also possible. For example opening the directory and using unlinkat() would likely shave off a bit of CPU. But the dominating factor here is likely the bad access pattern. |
|
|
msg280213 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-07 17:29 |
Proposed patch implements shutil.rmtree using os.scandir. Needed file descriptors support in os.scandir (). I did not test how this affects the performance of shutil.rmtree. |
|
|
msg281618 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-11-24 08:27 |
Benchmarks show about 20% speed up. |
|
|
msg305000 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-10-25 16:25 |
Following Antoine's suggestion the patch now makes shutil.rmtree() using os.scandir() on all platforms. I doubt about one thing. This patch changes os.listdir passed to the onerror handler to os.scandir. This can break a user code that checks if the first argument in onerror is os.listdir. If keep this change, it should be documented in the "Porting to 3.7" section. Alternatively, we can continue passing os.listdir if os.scandir() failed despites the fact that os.listdir no longer used. |
|
|
msg305001 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2017-10-25 16:25 |
I think we should change to os.scandir. No need to accumulate compatibility baggage like that. |
|
|
msg305556 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-11-04 12:16 |
New changeset d4d79bc1ff91b04625c312f0219c89aabcd19ce4 by Serhiy Storchaka in branch 'master': bpo-28564: Use os.scandir() in shutil.rmtree(). (#4085) https://github.com/python/cpython/commit/d4d79bc1ff91b04625c312f0219c89aabcd19ce4 |
|
|
msg309219 - (view) |
Author: Niklas Hambüchen (nh2) |
Date: 2017-12-30 05:15 |
I've filed https://bugs.python.org/issue32453, which is about O(n^2) deletion behaviour for large directories. |
|
|