[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7? (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Fri Nov 25 12:04:00 CET 2011
- Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner <anacrolix at gmail.com> wrote:
It's Python 3.2. I tried it for larger files and got some interesting results. readinto() for 10MB files, reading 10MB all at once: readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best of 3: 29.6 msec per loop readinto/3.3 100 loops, best of 3: 19.5 msec per loop With 100KB chunks for the 10MB file (annotated with #): matt at stanley:~/Desktop$ for f in read bytearrayread readinto; do for v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import readinto' "readinto.$f()"; done; done read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually faster than the 10MB read read/3.2 10 loops, best of 3: 253 msec per loop # wtf? read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast).
readinto/2.7 100 loops, best of 3: 8.93 msec per loop readinto/3.2 100 loops, best of 3: 10.3 msec per loop # suddenly 3.2 is performing well? readinto/3.3 10 loops, best of 3: 20.4 msec per loop
What if you allocate the bytearray outside of the timed function?
Regards
Antoine.
- Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]