Issue 849407: urllib reporthook could be more informative (original) (raw)
A reporthook in urllib.urlretrieve() (in 2.3.2) is given the max number of characters accepted ("bs") per .read() as its second argument. It would be more helpful to receive the number of characters actually retrieved in the most recent block.
While perhaps this would break some existing code (though I can't imagine how), the minor patches below will allow giving progess updates, etc. that are accurate.
Thanks
Allan Wilson
*** urllib.py.old Tue Nov 25 17:42:55 2003 --- urllib.py Tue Nov 25 18:00:50 2003
*** 236,248 **** reporthook(0, bs, size) block = fp.read(bs) if reporthook: ! reporthook(1, bs, size) while block: tfp.write(block) block = fp.read(bs) blocknum = blocknum + 1 if reporthook: ! reporthook(blocknum, bs, size) fp.close() tfp.close() del fp --- 236,248 ---- reporthook(0, bs, size) block = fp.read(bs) if reporthook: ! reporthook(1, len(block), size) while block: tfp.write(block) block = fp.read(bs) blocknum = blocknum + 1 if reporthook: ! reporthook(blocknum, len(block), size) fp.close() tfp.close() del fp
I notice that the patch doesn't apply to the svn head (2.6a0). But that's easily fixed and the idea still applies.
As the original author of the code being patched I believe my reason for doing it the old way was that I wanted the report hook to be called before the first block, which would let a GUI open up a dialog box before anything was read. The idea was that if the reads are really slow, you'd want the dialog box there right from the start. But this was rather naive, since the most likely source of delay is making the connection and getting the response header back, and the report hook isn't being called at all until all the headers have been seen.
The changed API to reporthook() needs to be documented very clearly. There's one call to reporthook() that still passes the block size instead of the actual data size. A naive implementation could be confused by this call, although it is easily recognized because it is the first call and the only one with blocknum equal to zero.
I think this is a fine change -- as long as it isn't backported, since it is clearly a feature change. I do wonder "why bother", since most people using urllib don't care all that much about extreme details (I can't remember the last time I specified a reporthook), and most people caring about details don't like urllib and use something else (e.g. httplib, or urllib2).
So I guess I'm somewhere between +0 and -0 on this on this.