[Python-Dev] Prefetching on buffered IO files (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Tue Sep 28 00:41:19 CEST 2010
- Previous message: [Python-Dev] Mark PEP 3148 as Final?
- Next message: [Python-Dev] Prefetching on buffered IO files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
While trying to solve #3873 (poor performance of pickle on file objects, due to the overhead of calling read() with very small values), it occurred to me that the prefetching facilities offered by BufferedIOBase are not flexible and efficient enough.
Indeed, if you use seek() and read(), 1) you limit yourself to seekable files 2) performance can be hampered by very bad seek() performance (this is true on GzipFile).
If instead you use peek() and read(), the situation is better, but you end up doing multiple copies of data; also, you must call read() to advance the file pointer even though you don't care about the results.
So I would propose adding the following method to BufferedIOBase:
prefetch(self, buffer, skip, minread)
Skip skip
bytes from the stream. Then, try to read at
least minread
bytes and write them into buffer
. The file
pointer is advanced by at most skip + minread
, or less if
the end of file was reached. The total number of bytes written
in buffer
is returned, which can be more than minread
if additional bytes could be prefetched (but, of course,
cannot be more than len(buffer)
).
Arguments:
buffer
: a writable buffer (e.g. bytearray)skip
: number of bytes to skip (must be >= 0)minread
: number of bytes to read (must be >= 0 and <= len(buffer))
Also, the BufferedIOBase ABC can then provide default implementations of read(), readinto() and peek(), simply by calling prefetch(). (how read1() can fit into the picture is not obvious)
What do you think?
Regards
Antoine.
- Previous message: [Python-Dev] Mark PEP 3148 as Final?
- Next message: [Python-Dev] Prefetching on buffered IO files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]