msg76915 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2008-12-04 18:30 |
C.l.p poster reported that 3.0 file.read is orders of magnitude slower than with 2.5 (but confused issue with buffer = 0). Jerry Hill reported "Here's a quick comparison between 2.5 and 3.0 on a relatively small 17 meg file: C:\>c:\Python30\python -m timeit -n 1 "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()" 1 loops, best of 3: 36.8 sec per loop C:\>c:\Python25\python -m timeit -n 1 "open('C:\\work\\temp\\bppd_vsub.csv', 'rb').read()" 1 loops, best of 3: 33 msec per loop That's 3 orders of magnitude slower on python3.0!" I verified this informally on WinXP by opening and then reading Doc/Pythonxy.chm (about 4 megs) -- an eye blink versus 3 seconds, repeated. Even the open seemed slower but I did not time it. >>> f=open('Doc/Python30.chm','rb') >>> d=f.read() |
|
|
msg76920 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-12-04 20:21 |
This needs definitely some testing! |
|
|
msg76934 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-12-04 21:56 |
The small buffer size in Modules/_fileio.c is one reason for the slowness. $ dd if=/dev/zero of=zeros bs=1MB count=50 $ cat testread.py open("zeros", "rb").read() $ ./python -m cProfile testread.py 40 function calls (39 primitive calls) in 4.246 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.016 0.016 4.246 4.246 :1() 1 0.000 0.000 0.000 0.000 io.py:277(__new__) 2 0.000 0.000 0.000 0.000 io.py:355(flush) 2 0.000 0.000 0.000 0.000 io.py:364(close) 2 0.000 0.000 0.000 0.000 io.py:376(__del__) 1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable) 1 0.000 0.000 0.000 0.000 io.py:614(__init__) 2 0.000 0.000 0.000 0.000 io.py:618(close) 1 0.000 0.000 0.000 0.000 io.py:708(__init__) 1 0.000 0.000 0.000 0.000 io.py:733(flush) 1 0.000 0.000 0.000 0.000 io.py:736(close) 1 0.000 0.000 0.000 0.000 io.py:755(closed) 1 0.000 0.000 0.000 0.000 io.py:82(open) 1 0.000 0.000 0.000 0.000 io.py:896(__init__) 2 0.000 0.000 0.000 0.000 io.py:905(_reset_read_buf) 1 0.021 0.021 4.230 4.230 io.py:909(read) 1 0.000 0.000 4.209 4.209 io.py:920(_read_unlocked) 1 0.000 0.000 0.000 0.000 {built-in method allocate_lock} 2/1 0.000 0.000 4.246 4.246 {built-in method exec} 1 0.000 0.000 0.000 0.000 {built-in method fstat} 2 0.000 0.000 0.000 0.000 {built-in method isinstance} 3 0.000 0.000 0.000 0.000 {built-in method len} 1 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {method 'fileno' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'isatty' of '_FileIO' objects} 1 0.825 0.825 0.825 0.825 {method 'join' of 'bytes' objects} 2 3.384 1.692 3.384 1.692 {method 'read' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'readable' of '_FileIO' objects} $ vi Modules/_fileio.c -#define DEFAULT_BUFFER_SIZE (8*1024) +#define DEFAULT_BUFFER_SIZE (80*1024) $ ./python -m cProfile testread.py 40 function calls (39 primitive calls) in 1.273 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.019 0.019 1.273 1.273 :1() 1 0.000 0.000 0.000 0.000 io.py:277(__new__) 2 0.000 0.000 0.000 0.000 io.py:355(flush) 2 0.000 0.000 0.000 0.000 io.py:364(close) 2 0.000 0.000 0.000 0.000 io.py:376(__del__) 1 0.000 0.000 0.000 0.000 io.py:413(_checkReadable) 1 0.000 0.000 0.000 0.000 io.py:614(__init__) 2 0.000 0.000 0.000 0.000 io.py:618(close) 1 0.000 0.000 0.000 0.000 io.py:708(__init__) 1 0.000 0.000 0.000 0.000 io.py:733(flush) 1 0.000 0.000 0.000 0.000 io.py:736(close) 1 0.000 0.000 0.000 0.000 io.py:755(closed) 1 0.000 0.000 0.000 0.000 io.py:82(open) 1 0.000 0.000 0.000 0.000 io.py:896(__init__) 2 0.000 0.000 0.000 0.000 io.py:905(_reset_read_buf) 1 0.016 0.016 1.254 1.254 io.py:909(read) 1 0.000 0.000 1.238 1.238 io.py:920(_read_unlocked) 1 0.000 0.000 0.000 0.000 {built-in method allocate_lock} 2/1 0.000 0.000 1.273 1.273 {built-in method exec} 1 0.000 0.000 0.000 0.000 {built-in method fstat} 2 0.000 0.000 0.000 0.000 {built-in method isinstance} 3 0.000 0.000 0.000 0.000 {built-in method len} 1 0.000 0.000 0.000 0.000 {method '__enter__' of '_thread.lock' objects} 1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {method 'fileno' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'isatty' of '_FileIO' objects} 1 1.156 1.156 1.156 1.156 {method 'join' of 'bytes' objects} 2 0.081 0.041 0.081 0.041 {method 'read' of '_FileIO' objects} 1 0.000 0.000 0.000 0.000 {method 'readable' of '_FileIO' objects} |
|
|
msg76936 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-12-04 22:35 |
The fileio_buffer.patch implements the same progressive buffer as Python 2.x' Object/fileobject.c. |
|
|
msg76940 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-12-04 22:50 |
patch looks good to me. nitpick comments: use += instead of = and + in: newsize = newsize + newsize and newsize = newsize + BIGCHUNK. As for the XXX about overflow, so long as BUFSIZ is not defined to be an insanely large number (it should never be) this will be fine. add a preprocessor test for that in. #if (BUFSIZ >= 2**30) #error "unreasonable BUFSIZ defined" #endif |
|
|
msg76944 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-12-04 23:37 |
The preprocessor doesn't handle power. 2 << 24 (64MB) sounds sufficient for me. |
|
|
msg76971 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-12-05 07:43 |
fileio_buffer2.patch looks good other than minor touchups: Turn the XXX comment into: /* NOTE: overflow impossible due to limits on BUFSIZ * Also, 2 << 24 is 32MB yet your error message test says >= 64MB. I think you meant 1 << 26. fix those and commit. :) |
|
|
msg76981 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2008-12-05 08:33 |
The updated patch has been committed to 3.0 and 3.1. I'm going to backport the patch to 2.x later. |
|
|
msg78098 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-12-20 12:48 |
Since it is solved for 3.x and only needs to be bacported to 2.x (where the "io" module isn't the default), downgrading to critical. |
|
|
msg90024 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-07-02 16:06 |
This has been fixed as part of the big IO update in trunk. I assume nobody really cares about making a separate patch for 2.6, please re-open if you are interested! |
|
|