[Python-Dev] Prefetching on buffered IO files (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Wed Sep 29 10:55:56 CEST 2010


On Wed, 29 Sep 2010 10:06:57 +0200 Hagen Fürstenau <hagen at zhuliguan.net> wrote:

> Ow... I've always assumed that seek() is essentially free, because > that's how a typical OS kernel implements it. If seek() is bad on > GzipFile, how hard would it be to fix this?

I'd imagine that there's no easy way to make arbitrary seeks on a GzipFile fast. But wouldn't it be enough to optimize small relative (backwards) seeks?

As I explained to Guido, GzipFile doesn't know the buffering size of its consumer (apart from introducing couplings), and therefore has no way to know how much information it must retain.

To reiterate, there's a complicated solution (optimize an implementation-dependent behaviour of GzipFile, with a non-trivial coding effort and performance tradeoff) which will not work on unseekable files anyway. And there's a more generic solution involving non-seeking primitives such as read() + peek().

(follow-up to python-ideas, if I didn't mess up the headers)

Regards

Antoine.



More information about the Python-Dev mailing list