Learning Python 3rd Edition: Python 3.0 Speed Tests (original) (raw)
[Update] For related and more recent results, also see the following pages:
This page documents a major performance regression that existed in Python 3.0 (3.0.0) which made most I/O operations much slower than they were in Python 2.X—up to 1,000 times slower in some cases.
This issue was largely repaired in later 3.X releases: Python 3.0.1 included minor improvements, and Python 3.1 fixed the problem more broadly, making this specific issue a moot point for most Python users today. That being said, 3.X's performance still generally lags behind that of 2.X for many other types of code not measured here; see Learning Python's benchmarking chapter for details.
This page describes the original 3.0 speed issue and the timing techniques used in testing, for historical context, and for users who might have 3.0.0 installed. The two pages listed above provide updated results to show the improvements made in later 3.X releases, but they employ testing techniques described on this page, so you'll probably want to start reading here first.
Python 3.0 I/O is Radically Slower
_[Original post: January 2009]_A crucial factor for many programmers to consider, 3.0 has major performance issues that may preclude its widespread use until a future optimizing release. While some operations run roughly as fast in 3.0 as in 2.6, others do not. In particular, file I/O is so slow in 3.0 as to be completely impractical for a large set of Python programs.
Before I get into details, I want to stress that this issue is hopefully a temporary one, and you probably don't need to care about it at all if you process small files only. This is primarily an issue for programs that spend a significant amount of their time scanning large files. However, that's common enough that the issue is sufficient cause for many programmers to avoid using the 3.X line until an optimized version is released (perhaps 3.0.2 or 3.1, given current development plans).
Unfortunately, this problem doesn't seem to be getting the priority it deserves today. Perhaps worse, broad 3.0 design decisions such as the string/unicode merge and I/O library redesign may make it difficult to ever bring 3.X up to speed with 2.X in terms of I/O performance. If you care about I/O speed, please feel free to help elevate this in the Python world.
What others have reported
So how bad is it? According to posted benchmarks starting to filter in, some binary I/O can be 3 to 4 times slower in 3.0, writing text files can be 5 to 8 times slower, and reading text files line-by-line can run 40 to 70 times slower in 3.0 than in 2.X (that's times, not percent!). Reading large files all-at-once can be even worse -- hundreds of times slower in 3.0 typically, and sometimes even slower than that.
This is apparently due to an I/O library rewrite (which aims to replace some of the underlying C library), changes in buffer allocation schemes, and possibly the new all-Unicode focus. The net effect can be observed in one posted test, in which a simple line-by-line iteration over a fairly large (66M) file:
for line in open('aBigFile.txt'): pass
takes roughly half a second (0.65) in Python 2.6, but will take between 32 and 42 seconds under Python 3.0, depending on whether the file is opened in text or binary modes. That is, 3.0 is between 48 and 66 times slower than 2.6 when reading text line-by-line.
Since I/O is a major time component for many programs that scan or massage data, going from half a second to 30 or 40 seconds is indeed a BIG DEAL, and a major disappointment to many Python users (myself included). Another way to look at this is that I/O time for programs that read text by lines has essentially increased from N seconds to N minutes in 3.0 -- enough to qualify as impractical for many programs.
Although reading line-by-line is probably the most common way to process text, this slowdown can get better or worse, depending on how you process files (though it's present in all normal modes). A formal timing benchmark written by Python developers shows that, compared to 2.6, 3.0 is:
- 5X to 6X slower at reading 1 or 20 characters at a time, in a 400K file
- 48X slower at reading line by line, in a 400K file
- 40X slower at reading 4K bytes at a time, in a 400K file
- 20X, 19X, and 12X slower at reading 20K, 400K, and 10M all at once, respectively
Much worse, according to some reports reading large files all at once might take up to a shocking 1,000 times longer in 3.0. For instance, one benchmark posted on comp.lang.python shows a .read() of a 17M file going from 33 msecs in 2.5, to 36.8 seconds in 3.0. That's a slowdown of 3 orders of magnitude, or 1,000X. Additionally, some tests show that:
- Writing text files is 5X to 8X slower in 3.0
- The 3.0 print function is 50 times slower than 2.X's print statement
- Writing large chunks of data in 3.0 apparently makes a copy of the data being written, thereby potentially doubling the memory footprint of a program.
Python 3.0 runs slower than 2.X overall too (3.0 reportedly runs the pystone benchmark 10% slower than 2.5), but its I/O speed is a critical regression. In fact, it's not an exaggeration to say that 3.0 has effectively broken Python performance for many users, at least until this is resolved.
For more background on this speed issue, see the next section, as well as the following web pages from which I gleaned some of the statistics above (try a web search for additional discussions):
- http://bugs.python.org/issue4561has the bug report, from which I borrowed many of the input statistics mentioned above. Be sure to see the full results of the new file performance benchmark there.
- http://bugs.python.org/issue4533has a related bug report, citing the test mentioned above that shows 3.0 to be slower than 2.5 by 3 orders of magnitude when reading a 17M file all at once (yes, that's some 1,000 times slower!)
- http://bugs.python.org/issue4565is another related bug report, which states that write performance for text files is 5X to 8X slower in Python 3.0 than in Python 2.5 (and 2.6, presumably), and provides another benchmark to prove it.
- Another web page with similar benchmark results:located here.
Results of my own 3.0 speed tests
Since writing the prior section, I've run some tests of my own to verify the speed problem in 3.0. Their code is available off-page, if you want to fetch and try on your own:
- <timeSEQ.py> -- various sequence iteration timing tests
- <timeIO.py> -- various file read and write timing tests
- <timeboth.py> -- run timeIO.py under 2.6 and 3.0
- <timebothCMP.py> -- same as prior, but compares outputs
- timebothCMP.out.txt -- output of timebothCMP.py
The short story is that I/O is at least as bad in 3.0 as noted by others, though non-IO operations seem roughly as fast. In a bit more detail, the first of the files listed above times a non-IO CPU intensive operation -- sequence iteration alternatives. When timing an operation like this, 2.6 is only slightly faster than 3.0:
C:\misc>C:\Python26\python timeSEQ.py 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] forStatement => 5.517, [-10000, -9998]...[9996, 9998] listComprehension => 3.958, [-10000, -9998]...[9996, 9998] mapFunction => 4.054, [-10000, -9998]...[9996, 9998] generatorExpression => 4.560, [-10000, -9998]...[9996, 9998]
C:\misc>C:\Python30\python timeSEQ.py 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] forStatement => 5.753, [-10000, -9998]...[9996, 9998] listComprehension => 4.399, [-10000, -9998]...[9996, 9998] mapFunction => 4.492, [-10000, -9998]...[9996, 9998] generatorExpression => 4.968, [-10000, -9998]...[9996, 9998]
However, the rest of the files listed above time file I/O specifically. The last listed file summarizes the relative speeds of I/O on my Windows Vista laptop, under both 2.6 and 3.0. For a text test file, I concatenated Python 2.6's NEWS.txt file to itself multiple times; my binary test file is Python 3.0's MSI installer file for Windows; and the write tests each produce a 25M file, whether text or binary.
As you can see from the results file listed below, my results are in line with those reported by others and summarized above. Python 3.0 is much slower -- typically around 50 times slower, and over 800 times slower in a worse case. Moreover, writing data can be up to 12 times slower than 2.6, depending on the way data is written:
Output data sizes: 524288 50 50 26214400 26214400
[Python 2.6: large.txt, 2935401 bytes] read_byLines_textMode (large.txt=2.80M) => 0.024772 read_byLines_binaryMode (large.txt=2.80M) => 0.015806 read_byBlocks_textMode (large.txt=2.80M) => 0.009921 read_byBlocks_binaryMode (large.txt=2.80M) => 0.001622 read_allAtOnce_textMode (large.txt=2.80M) => 0.011466 read_allAtOnce_binaryMode (large.txt=2.80M) => 0.003305
[Python 2.6: large.bin, 13168640 bytes] read_byBlocks_binaryMode (large.bin=12.56M) => 0.010219 read_allAtOnce_binaryMode (large.bin=12.56M) => 0.016566
[Python 2.6: testIO.out, 26214400 bytes] write_byLines_textMode (testIO.out=25.00M) => 0.453313 write_byLines_binaryMode (testIO.out=25.00M) => 1.235918 write_byBlocks_textMode (testIO.out=25.00M) => 1.464019 write_byBlocks_binaryMode (testIO.out=25.00M) => 0.560694 write_allAtOnce_textMode (testIO.out=25.00M) => 0.174307 write_allAtOnce_binaryMode (testIO.out=25.00M) => 0.579587
Output data sizes: 524288 50 50 26214400 26214400
[Python 3.0: large.txt, 2935401 bytes] read_byLines_textMode (large.txt=2.80M) => 1.024868 read_byLines_binaryMode (large.txt=2.80M) => 1.299153 read_byBlocks_textMode (large.txt=2.80M) => 0.539751 read_byBlocks_binaryMode (large.txt=2.80M) => 0.002815 read_allAtOnce_textMode (large.txt=2.80M) => 0.718675 read_allAtOnce_binaryMode (large.txt=2.80M) => 0.670614
[Python 3.0: large.bin, 13168640 bytes] read_byBlocks_binaryMode (large.bin=12.56M) => 0.014730 read_allAtOnce_binaryMode (large.bin=12.56M) => 13.695692
[Python 3.0: testIO.out, 26214400 bytes] write_byLines_textMode (testIO.out=25.00M) => 5.409266 write_byLines_binaryMode (testIO.out=25.00M) => 2.160819 write_byBlocks_textMode (testIO.out=25.00M) => 4.927914 write_byBlocks_binaryMode (testIO.out=25.00M) => 1.884943 write_allAtOnce_textMode (testIO.out=25.00M) => 1.313119 write_allAtOnce_binaryMode (testIO.out=25.00M) => 0.603439
==Summary==
[read_byLines_textMode (large.txt=2.80M)] => 3.0 is 41.372 times slower [read_byLines_binaryMode (large.txt=2.80M)] => 3.0 is 82.194 times slower [read_byBlocks_textMode (large.txt=2.80M)] => 3.0 is 54.405 times slower [read_byBlocks_binaryMode (large.txt=2.80M)] => 3.0 is 1.736 times slower [read_allAtOnce_textMode (large.txt=2.80M)] => 3.0 is 62.679 times slower [read_allAtOnce_binaryMode (large.txt=2.80M)] => 3.0 is 202.909 times slower [read_byBlocks_binaryMode (large.bin=12.56M)] => 3.0 is 1.441 times slower [read_allAtOnce_binaryMode (large.bin=12.56M)] => 3.0 is 826.735 times slower [write_byLines_textMode (testIO.out=25.00M)] => 3.0 is 11.933 times slower [write_byLines_binaryMode (testIO.out=25.00M)] => 3.0 is 1.748 times slower [write_byBlocks_textMode (testIO.out=25.00M)] => 3.0 is 3.366 times slower [write_byBlocks_binaryMode (testIO.out=25.00M)] => 3.0 is 3.362 times slower [write_allAtOnce_textMode (testIO.out=25.00M)] => 3.0 is 7.533 times slower [write_allAtOnce_binaryMode (testIO.out=25.00M)] => 3.0 is 1.041 times slower
Some footnotes on my results
Please study the linked source files for more details on these tests, but a few notes are in order:
- The usual caveats regarding test variables apply. It's unlikely that you will get identical results, but they will probably be similar. In my testing, these results tend to vary substantially by file size, but 3.0 seems to grow only slower as input file sizes grow larger. Content matters too; for the text read tests, for example, copying the input file in ASCII mode instead of binary mode on Windows (using "copy x.txt /A + y.txt /A y.txt") produced the better results for 3.0 listed above, presumably either because binary mode results in very long lines read all at once by Python, or some embedded characters caused Unicode encoding changes. Finally, I/O tests are also impacted by caches and memory management in your operating system, and even test ordering issues In my experiments, though, reordering tests made only a slight difference at best, and usually made no difference at all (and test orderings are not a valid way to disqualify speed tests anyhow -- real programs may have arbitrary operation orderings, some of which are bound to mirror those of testing scripts). Regardless of such test variables, it's clear the I/O runs much more slowly in 3.0.
- Notice that the truly binary "large.bin" input file is not run through the text mode tests (3.0 cannot decode its data into strings), or the line-oriented tests (there are no real line breaks in binary data).
- I've omitted even worse 3.0 test results from my tests for larger input files, because they can take too long to run (for my patience level, at least); for one large file test that did terminate, for example, one read case took over 3,000 times longer to run in 3.0 than on 2.6.
- When reading by blocks, 3.0 is 54X slower than 3.0 in text mode, but only 1.7X slower in binary mode. Extra work is probably required in 3.0 to handle the possibility that a character can span multiple bytes and be only partially read in block mode (though this was probably not a factor in the ASCII input file used).
- As you can see, reading a large file all at once is horribly slow in 3.0; reading by blocks is much better, and only 50% slower than 2.6. If you care about speed for binary files in 3.0, read by blocks instead of all at once.
- When reading either by lines or all at once, 3.0's binary mode performs 2 or 3 times as badly as text mode relative to 2.6 for the same file. For instance, reading by lines is some 41X slower in text mode, but 82x slower in binary mode. However, 3.0's absolute times for text and binary mode in these tests are similar (see above the "Summary" line); the relative result difference stems entirely from the fact that 2.6's binary mode is noticably faster in absolute time for binary mode these cases. Although it's impossible to be certain without further research, the fact that text and binary modes take roughly the same absolute time in 3.0 seems to discount unicode as the culprit behind the 3.0 slowdown (text mode would otherwise be slower, due to the need to decode bytes and create unicode objects). The slowdown seems more likely in I/O libraries or memory management. Then again, 2.6's binary modes may simply be optimized in a way 3.0 is not.
Discounting non-idiomatic cases for 3.0
Most of the tests I run are valid use cases for 3.0, as well as common 2.6 coding patterns that will be migrated to 3.0. Strictly speaking, because new 3.0 programs will likely make a binding choice between str for text data and bytes for binary data, the input tests "read_byLines_binaryMode" and "read_byBlocks_textMode", as well as the output tests "write_byLines_binaryMode" and "write_byBlocks_textMode" will probably be relatively atypical in 3.0 practice. Discounting these tests leaves the following:
[read_byLines_textMode (large.txt=2.80M)] => 3.0 is 41.372 times slower [read_byBlocks_binaryMode (large.txt=2.80M)] => 3.0 is 1.736 times slower [read_allAtOnce_textMode (large.txt=2.80M)] => 3.0 is 62.679 times slower [read_allAtOnce_binaryMode (large.txt=2.80M)] => 3.0 is 202.909 times slower [read_byBlocks_binaryMode (large.bin=12.56M)] => 3.0 is 1.441 times slower [read_allAtOnce_binaryMode (large.bin=12.56M)] => 3.0 is 826.735 times slower [write_byLines_textMode (testIO.out=25.00M)] => 3.0 is 11.933 times slower [write_byBlocks_binaryMode (testIO.out=25.00M)] => 3.0 is 3.362 times slower [write_allAtOnce_textMode (testIO.out=25.00M)] => 3.0 is 7.533 times slower [write_allAtOnce_binaryMode (testIO.out=25.00M)] => 3.0 is 1.041 times slower
Further, the "allAtOnce" variants will not work for pathologically large files, too big for your computer's memory space. If we discount these tests as well in the interest of robust programs, we're left with the following cases:
[read_byLines_textMode (large.txt=2.80M)] => 3.0 is 41.372 times slower [read_byBlocks_binaryMode (large.txt=2.80M)] => 3.0 is 1.736 times slower [read_byBlocks_binaryMode (large.bin=12.56M)] => 3.0 is 1.441 times slower [write_byLines_textMode (testIO.out=25.00M)] => 3.0 is 11.933 times slower [write_byBlocks_binaryMode (testIO.out=25.00M)] => 3.0 is 3.362 times slower
This still doesn't look great for 3.0, but at least it omits some of the more negative findings. I don't think it's valid to discount all the other cases this way, though, for two reasons:
- They represent valid and common coding patterns in 2.6 that will likely be migrated to 3.0 unchanged, and may be used by 2.X programmers in new 3.0 code. 3.0's speed on all these test cases matters, even if they are not ideally idiomatic for 3.0. All-at-once modes, for example, are very commonly used.
- They may run faster than more typical modes in 3.0, and so provide optimization options. For example, "write_byLines_binaryMode" runs an order of magnitude quicker than the more idiomatic "write_byLines_textMode", and all-at-once modes sometimes run quicker in general.
In any event, even without some cases that are arguably less common or ideal in 3.0, it still comes out 41 times slower for reading lines from a text file, 50% slower for reading binary data, and 3 to 12 times slower for writing data.
Please feel free to fetch and play with these tests on your own, varying test file sizes and other parameters. You should also test your specific file usage patterns. My tests try to capture all common and valid file processing modes for 2.6 and 3.0, but they are not necessarily universally applicable. Further, system caching, test ordering, and other factors can impact test outcomes, so always verify on your own. At the least, you can rerun these scripts in future 3.X releases to see if the issue has been resolved.