Issue 4533: 3.0 file.read dreadfully slow (original) (raw)

Issue4533

Created on 2008-12-04 18:30 by terry.reedy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fileio_buffer.patch christian.heimes,2008-12-04 22:35
fileio_buffer2.patch christian.heimes,2008-12-04 23:37
Messages (10)
msg76915 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-12-04 18:30
C.l.p poster reported that 3.0 file.read is orders of magnitude slower than with 2.5 (but confused issue with buffer = 0). Jerry Hill reported "Here's a quick comparison between 2.5 and 3.0 on a relatively small 17 meg file: C:\>c:\Python30\python -m timeit -n 1 "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()" 1 loops, best of 3: 36.8 sec per loop C:\>c:\Python25\python -m timeit -n 1 "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()" 1 loops, best of 3: 33 msec per loop That's 3 orders of magnitude slower on python3.0!" I verified this informally on WinXP by opening and then reading Doc/Pythonxy.chm (about 4 megs) -- an eye blink versus 3 seconds, repeated. Even the open seemed slower but I did not time it. >>> f=open('Doc/Python30.chm','rb') >>> d=f.read()
msg76920 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 20:21
This needs definitely some testing!
msg76934 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 21:56
The small buffer size in Modules/_fileio.c is one reason for the slowness. $ dd if=/dev/zero of=zeros bs=1MB count=50 $ cat testread.py open("zeros", "rb").read() $ ./python -m cProfile testread.py 40 function calls (39 primitive calls) in 4.246 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.016 0.016 4.246 4.246 :1() 1 0.000 0.000 0.000 0.000 io.py:277(__new__) 2 0.000 0.000 0.000 0.000 io.py:355(flush) 2 0.000 0.000 0.000 0.000 io.py:364(close) 2 0.000 0.000 0.000 0.000 io.py:376(__del__) 1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable) 1 0.000 0.000 0.000 0.000 io.py:614(__init__) 2 0.000 0.000 0.000 0.000 io.py:618(close) 1 0.000 0.000 0.000 0.000 io.py:708(__init__) 1 0.000 0.000 0.000 0.000 io.py:733(flush) 1 0.000 0.000 0.000 0.000 io.py:736(close) 1 0.000 0.000 0.000 0.000 io.py:755(closed) 1 0.000 0.000 0.000 0.000 io.py:82(open) 1 0.000 0.000 0.000 0.000 io.py:896(__init__) 2 0.000 0.000 0.000 0.000 io.py:905(_reset_read_buf) 1 0.021 0.021 4.230 4.230 io.py:909(read) 1 0.000 0.000 4.209 4.209 io.py:920(_read_unlocked) 1 0.000 0.000 0.000 0.000 {built-in method allocate_lock} 2/1 0.000 0.000 4.246 4.246 {built-in method exec} 1 0.000 0.000 0.000 0.000 {built-in method fstat} 2 0.000 0.000 0.000 0.000 {built-in method isinstance} 3 0.000 0.000 0.000 0.000 {built-in method len} 1 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {method 'fileno' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'isatty' of '_FileIO' objects} 1 0.825 0.825 0.825 0.825 {method 'join' of 'bytes' objects} 2 3.384 1.692 3.384 1.692 {method 'read' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'readable' of '_FileIO' objects} $ vi Modules/_fileio.c -#define DEFAULT_BUFFER_SIZE (8*1024) +#define DEFAULT_BUFFER_SIZE (80*1024) $ ./python -m cProfile testread.py 40 function calls (39 primitive calls) in 1.273 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.019 0.019 1.273 1.273 :1() 1 0.000 0.000 0.000 0.000 io.py:277(__new__) 2 0.000 0.000 0.000 0.000 io.py:355(flush) 2 0.000 0.000 0.000 0.000 io.py:364(close) 2 0.000 0.000 0.000 0.000 io.py:376(__del__) 1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable) 1 0.000 0.000 0.000 0.000 io.py:614(__init__) 2 0.000 0.000 0.000 0.000 io.py:618(close) 1 0.000 0.000 0.000 0.000 io.py:708(__init__) 1 0.000 0.000 0.000 0.000 io.py:733(flush) 1 0.000 0.000 0.000 0.000 io.py:736(close) 1 0.000 0.000 0.000 0.000 io.py:755(closed) 1 0.000 0.000 0.000 0.000 io.py:82(open) 1 0.000 0.000 0.000 0.000 io.py:896(__init__) 2 0.000 0.000 0.000 0.000 io.py:905(_reset_read_buf) 1 0.016 0.016 1.254 1.254 io.py:909(read) 1 0.000 0.000 1.238 1.238 io.py:920(_read_unlocked) 1 0.000 0.000 0.000 0.000 {built-in method allocate_lock} 2/1 0.000 0.000 1.273 1.273 {built-in method exec} 1 0.000 0.000 0.000 0.000 {built-in method fstat} 2 0.000 0.000 0.000 0.000 {built-in method isinstance} 3 0.000 0.000 0.000 0.000 {built-in method len} 1 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {method 'fileno' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'isatty' of '_FileIO' objects} 1 1.156 1.156 1.156 1.156 {method 'join' of 'bytes' objects} 2 0.081 0.041 0.081 0.041 {method 'read' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'readable' of '_FileIO' objects}
msg76936 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 22:35
The fileio_buffer.patch implements the same progressive buffer as Python 2.x' Object/fileobject.c.
msg76940 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-12-04 22:50
patch looks good to me. nitpick comments: use += instead of = and + in: newsize = newsize + newsize and newsize = newsize + BIGCHUNK. As for the XXX about overflow, so long as BUFSIZ is not defined to be an insanely large number (it should never be) this will be fine. add a preprocessor test for that in. #if (BUFSIZ >= 2**30) #error "unreasonable BUFSIZ defined" #endif
msg76944 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-04 23:37
The preprocessor doesn't handle power. 2 << 24 (64MB) sounds sufficient for me.
msg76971 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-12-05 07:43
fileio_buffer2.patch looks good other than minor touchups: Turn the XXX comment into: /* NOTE: overflow impossible due to limits on BUFSIZ * Also, 2 << 24 is 32MB yet your error message test says >= 64MB. I think you meant 1 << 26. fix those and commit. :)
msg76981 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-12-05 08:33
The updated patch has been committed to 3.0 and 3.1. I'm going to backport the patch to 2.x later.
msg78098 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-12-20 12:48
Since it is solved for 3.x and only needs to be bacported to 2.x (where the "io" module isn't the default), downgrading to critical.
msg90024 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-07-02 16:06
This has been fixed as part of the big IO update in trunk. I assume nobody really cares about making a separate patch for 2.6, please re-open if you are interested!
History
Date User Action Args
2022-04-11 14:56:42 admin set github: 48783
2009-07-02 16:06:30 pitrou set status: open -> closedresolution: accepted -> fixedmessages: +
2008-12-20 12:48:44 pitrou set priority: release blocker -> criticalnosy: + pitroumessages: +
2008-12-20 02:40:30 loewis set priority: deferred blocker -> release blocker
2008-12-10 08:22:55 loewis set priority: release blocker -> deferred blocker
2008-12-05 08:33:25 christian.heimes set resolution: acceptedstage: patch review -> commit reviewmessages: + versions: + Python 2.6, Python 2.7, - Python 3.0, Python 3.1
2008-12-05 07:43:31 gregory.p.smith set messages: +
2008-12-04 23:37:50 christian.heimes set files: + fileio_buffer2.patchmessages: +
2008-12-04 22:50:34 gregory.p.smith set nosy: + gregory.p.smithmessages: +
2008-12-04 22:35:12 christian.heimes set files: + fileio_buffer.patchkeywords: + patchmessages: + stage: test needed -> patch review
2008-12-04 21:56:20 christian.heimes set messages: +
2008-12-04 20:21:30 christian.heimes set versions: + Python 3.1nosy: + christian.heimesmessages: + priority: release blockercomponents: + Extension Modulesstage: test needed
2008-12-04 18:30:19 terry.reedy create