Issue 27194: Tarfile superfluous truncate calls slows extraction. (original) (raw)

Issue27194

Created on 2016-06-03 04:56 by fried, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
truncate.patch fried,2016-06-03 04:56 patch to move truncate for only sparse tar entries. review
test.py fried,2016-06-03 04:58 test file to generate random tar for benchmark
Messages (4)
msg267035 - (view) Author: Jason Fried (fried) * Date: 2016-06-03 04:56
With large tar file extracts I noticed that tarfile was slower than it should be. Seems in linux that for large files (10MB) truncate is not always a free operation even when it should be a no-op. ex: File is already 10mb seek to end and truncate. I created a script to test the validity of this patch. It generates two random tar archives containing 1024 files of 10mb each. The files are randomized so disk caching should not interfere. So to extract those 1g tar files the following was observed Time Delta for TarFile: 148.23699307441711 Time Delta for FastTarFile: 107.71058106422424 Time Diff: 40.52641201019287 0.27338932859929255
msg267037 - (view) Author: Jason Fried (fried) * Date: 2016-06-03 04:58
I ran this on Linux ext4. I didn't see much improvement on OSX with SSD.
msg268297 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-11 23:59
New changeset b63474aa8a5f by Łukasz Langa in branch '3.5': Issue #27194: superfluous truncate calls in tarfile.py slow down extraction https://hg.python.org/cpython/rev/b63474aa8a5f New changeset a4f918de25e5 by Łukasz Langa in branch 'default': Merge 3.5, issue #27194 https://hg.python.org/cpython/rev/a4f918de25e5
msg268298 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2016-06-12 00:02
Thanks for the patch, Jason. This is now merged and will be available in 3.5.2 and 3.6.
History
Date User Action Args
2022-04-11 14:58:31 admin set github: 71381
2016-06-12 00:02:19 lukasz.langa set status: open -> closedresolution: fixedmessages: +
2016-06-11 23:59:33 python-dev set nosy: + python-devmessages: +
2016-06-11 17:28:55 lukasz.langa set assignee: lukasz.langastage: patch review
2016-06-03 05:07:30 serhiy.storchaka set nosy: + lars.gustaebel
2016-06-03 04:58:33 fried set files: + test.pymessages: +
2016-06-03 04:56:33 fried create