Issue 32529: Call readinto in shutil.copyfileobj (original) (raw)

Created on 2018-01-10 22:11 by YoSTEALTH, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5158 closed YoSTEALTH,2018-01-11 13:18
PR 5147 closed YoSTEALTH,2018-01-11 13:42
Messages (7)
msg309784 - (view) Author: (YoSTEALTH) * Date: 2018-01-10 22:11
improved "copyfileobj" function to use less memory
msg309787 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-01-11 00:58
Looks like you want to use a "readinto" method to reduce data copying. One problem is that it is not specified exactly what kind of object "copyfileobj" supports reading from. The documentation only says "file-like". According to the glossary, this means io.RawIOBase, BufferedIOBase, or TextIOBase. However TextIOBase doesn't have a "readinto" method. And it wouldn't be hard to find that someone has written their own class that doesn't have "readinto" either. The other problem is you still need to support a negative "length" value, which is easier to do by calling "read".
msg309803 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 12:17
Martin, your points got me thinking... to make a proper copy of a file, it should be done using bytes! Text IO could easily lead to corrupting your file. for example (current function): with open(old_path, 'r', encoding='latin-1') as fsrc: with open(new_path, 'w') as fdst: copyfileobj(fsrc, fdst) This would lead you to have wrongly encoded file with different filesize.
msg309804 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 13:32
Ok, updated the patch to account for: - improved memory usage for bytes io using readinto - still supporting negative length - potential encoding mismatch bug fix while using text io did i miss anything?
msg309807 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-11 14:02
Please provide benchmark results to demonstrate the benefits of this change.
msg309808 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-01-11 14:16
I already blocked the PR with a request for benchmarks and proper tests.
msg309817 - (view) Author: (YoSTEALTH) * Date: 2018-01-11 17:36
here is the links to benchmark: https://repl.it/@altendky/timeit-of-proposed-shutilfileobjcopy https://gist.github.com/altendky/ff5ccee2baf9822dce69ae8aa66a0fdf https://paste.pound-python.org/show/urORPXztcbDlqXKTORAj/ # time ./file.py the problem is the result times are so sparitic that not sure what to post as the result! I have tried with actual file as well and the results are all over the place. If any of you out there are can do benchmark that can produce accurate results and post the result it would help.
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76710
2018-01-17 18:07:32 serhiy.storchaka set resolution: rejected
2018-01-17 18:04:42 YoSTEALTH set status: open -> closedstage: patch review -> resolved
2018-01-11 17:36:26 YoSTEALTH set messages: +
2018-01-11 14:16:50 christian.heimes set nosy: + christian.heimesmessages: +
2018-01-11 14:02:58 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2018-01-11 13:42:36 YoSTEALTH set pull_requests: + <pull%5Frequest5014>
2018-01-11 13:32:15 YoSTEALTH set messages: +
2018-01-11 13:26:06 YoSTEALTH set pull_requests: - <pull%5Frequest5004>
2018-01-11 13🔞01 YoSTEALTH set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5013>
2018-01-11 12:17:11 YoSTEALTH set messages: +
2018-01-11 00:58:03 martin.panter set nosy: + martin.pantermessages: + title: improved shutil.py function -> Call readinto in shutil.copyfileobj
2018-01-10 22:11:02 YoSTEALTH create