[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7? (original) (raw)

Matt Joiner anacrolix at gmail.com
Fri Nov 25 07:13:45 CET 2011


On Fri, Nov 25, 2011 at 12:07 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

On Fri, 25 Nov 2011 12:02:17 +1100 Matt Joiner <anacrolix at gmail.com> wrote:

It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion:

http://stackoverflow.com/q/8263899/149482 Just use a memoryview and slice it: b = bytearray(...) m = memoryview(b) n = f.readinto(m[someoffset:])

Cheers, this seems to be what I wanted. Unfortunately it doesn't perform noticeably better if I do this.

Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and creating a new buffer containing the running buffer, plus the new.

Strangely, I find that your readandcopy is faster at this, but not by much, than readinto. Here's the code, it's a bit explicit, but then so was the original:

BUFSIZE = 0x10000

def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = b'' while True: b = f.read(BUFSIZE) if not b: break s += b

def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') s = bytearray() while True: b = f.read(BUFSIZE) if not b: break s += b

def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') s = bytearray(os.path.getsize(FILENAME)) o = 0 while True: b = f.readinto(memoryview(s)[o:o+BUFSIZE]) if not b: break o += b

And the timings:

$ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.justread()' 10 loops, best of 3: 298 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 9.22 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readinto()' 100 loops, best of 3: 9.31 msec per loop

The file was 10MB. I expected readinto to perform much better than readandcopy. I expected readandcopy to perform slightly better than justread. This clearly isn't the case.

Also I saw some comments on "top-posting" am I guilty of this?

If tehre's a magical option in gmail someone knows about, please tell.

Kind of :) Regards Antoine.


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com



More information about the Python-Dev mailing list