Issue 13159: _io.FileIO uses a quadratic-time buffer growth algorithm (original) (raw)

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57368

classification

Title: _io.FileIO uses a quadratic-time buffer growth algorithm
Type: performance Stage: resolved
Components: IO Versions: Python 3.2, Python 3.3, Python 2.7

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nadeem.vawda Nosy List: benjamin.peterson, nadeem.vawda, pitrou, python-dev, stutzbach, vstinner
Priority: normal Keywords: patch

Created on 2011-10-12 16:02 by nadeem.vawda, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
buffer-growth.diff nadeem.vawda,2011-10-12 17:19 review
Messages (7)
msg145403 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-12 16:01
As mentioned in issue 6715, the buffer growth strategy used by _io.FileIO has quadratic running time if each reallocation is O(n). The code in question is new_buffersize() from Modules/_io/fileio.c: if (currentsize > SMALLCHUNK) { /* Keep doubling until we reach BIGCHUNK; then keep adding BIGCHUNK. */ if (currentsize <= BIGCHUNK) return currentsize + currentsize; else return currentsize + BIGCHUNK; } return currentsize + SMALLCHUNK; Is there a reason for this? One possible improvement would be to instead use the same formula as list.resize() in Objects/listobject.c: new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6); which has amortized O(n) running time behaviour. Your thoughts?
msg145418 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-12 17:19
I've attached a patch that makes the suggested change to FileIO, and also to _bz2.BZ2Compressor/Decompressor (which currently have the same issue).
msg145443 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-12 23:38
The patch looks good to me, thanks.
msg145448 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-13 11:45
New changeset d18c80a8c119 by Nadeem Vawda in branch '3.2': Issue #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one. http://hg.python.org/cpython/rev/d18c80a8c119 New changeset 4a6709a071d0 by Nadeem Vawda in branch 'default': Merge #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one. http://hg.python.org/cpython/rev/4a6709a071d0
msg145449 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-13 11:58
New changeset c1c434e30e06 by Nadeem Vawda in branch '2.7': Issue #13159: Replace FileIO's quadratic-time buffer growth algorithm with a linear-time one. http://hg.python.org/cpython/rev/c1c434e30e06
msg145450 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-13 12:04
Thank you :)
msg145453 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-10-13 14:01
No problem :)
History
Date User Action Args
2022-04-11 14:57:22 admin set github: 57368
2011-10-13 14:01:59 nadeem.vawda set status: open -> closedmessages: + assignee: nadeem.vawdaresolution: fixedstage: patch review -> resolved
2011-10-13 12:04:31 pitrou set messages: +
2011-10-13 11:58:31 python-dev set messages: +
2011-10-13 11:45:44 python-dev set nosy: + python-devmessages: +
2011-10-12 23:38:22 pitrou set messages: + versions: + Python 2.7, Python 3.2
2011-10-12 17:24:31 vstinner set nosy: + vstinner
2011-10-12 17:20:01 nadeem.vawda set stage: needs patch -> patch review
2011-10-12 17:19:15 nadeem.vawda set files: + buffer-growth.diffkeywords: + patchmessages: +
2011-10-12 16:02:00 nadeem.vawda create