(original) (raw)

Eli, the use pattern I was referring to is when you read in chunks,
and and append to a running buffer. Presumably if you know in advance
the size of the data, you can readinto directly to a region of a
bytearray. There by avoiding having to allocate a temporary buffer for
the read, and creating a new buffer containing the running buffer,
plus the new.

Strangely, I find that your readandcopy is faster at this, but not by
much, than readinto. Here's the code, it's a bit explicit, but then so
was the original:

BUFSIZE = 0x10000

def justread():
� �# Just read a file's contents into a string/bytes object
� �f = open(FILENAME, 'rb')

� �s = b''
� �while True:
� � � �b = f.read(BUFSIZE)
� � � �if not b:
� � � � � �break
� � � �s += b

def readandcopy():
� �# Read a file's contents and copy them into a bytearray.
� �# An extra copy is done here.
� �f = open(FILENAME, 'rb')

� �s = bytearray()
� �while True:
� � � �b = f.read(BUFSIZE)
� � � �if not b:
� � � � � �break
� � � �s += b

def readinto():
� �# Read a file's contents directly into a bytearray,
� �# hopefully employing its buffer interface
� �f = open(FILENAME, 'rb')

� �s = bytearray(os.path.getsize(FILENAME))
� �o = 0
� �while True:
� � � �b = f.readinto(memoryview(s)\[o:o+BUFSIZE\])
� � � �if not b:
� � � � � �break
� � � �o += b

And the timings:

$ python3 -O -m timeit 'import fileread\_bytearray'
'fileread\_bytearray.justread()'
10 loops, best of 3: 298 msec per loop
$ python3 -O -m timeit 'import fileread\_bytearray'
'fileread\_bytearray.readandcopy()'
100 loops, best of 3: 9.22 msec per loop
$ python3 -O -m timeit 'import fileread\_bytearray'
'fileread\_bytearray.readinto()'
100 loops, best of 3: 9.31 msec per loop

The file was 10MB. I expected readinto to perform much better than
readandcopy. I expected readandcopy to perform slightly better than
justread. This clearly isn't the case.

What is 'python3' on your machine? If it's 3.2, then this is consistent with my results. Try it with 3.3 and for a larger file (say \~100MB and up), you may see the same speed as on 2.7

Also, why do you think chunked reads are better here than slurping the whole file into the bytearray in one go? If you need it wholly in memory anyway, why not just issue a single read?
�
Eli