[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7? (original) (raw)

Eli Bendersky eliben at gmail.com
Fri Nov 25 07:41:47 CET 2011


Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and creating a new buffer containing the running buffer, plus the new. Strangely, I find that your readandcopy is faster at this, but not by much, than readinto. Here's the code, it's a bit explicit, but then so was the original: BUFSIZE = 0x10000 def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = b'' while True: b = f.read(BUFSIZE) if not b: break s += b def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') s = bytearray() while True: b = f.read(BUFSIZE) if not b: break s += b def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') s = bytearray(os.path.getsize(FILENAME)) o = 0 while True: b = f.readinto(memoryview(s)[o:o+BUFSIZE]) if not b: break o += b And the timings: $ python3 -O -m timeit 'import filereadbytearray' 'filereadbytearray.justread()' 10 loops, best of 3: 298 msec per loop $ python3 -O -m timeit 'import filereadbytearray' 'filereadbytearray.readandcopy()' 100 loops, best of 3: 9.22 msec per loop $ python3 -O -m timeit 'import filereadbytearray' 'filereadbytearray.readinto()' 100 loops, best of 3: 9.31 msec per loop The file was 10MB. I expected readinto to perform much better than readandcopy. I expected readandcopy to perform slightly better than justread. This clearly isn't the case. What is 'python3' on your machine? If it's 3.2, then this is consistent with my results. Try it with 3.3 and for a larger file (say ~100MB and up), you may see the same speed as on 2.7

Also, why do you think chunked reads are better here than slurping the whole file into the bytearray in one go? If you need it wholly in memory anyway, why not just issue a single read?

Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20111125/3645e2b7/attachment.html>



More information about the Python-Dev mailing list