Issue 1351: Add getsize() to io instances (original) (raw)

Created on 2007-10-28 11:09 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
py3k_sizeinfo.patch christian.heimes,2007-10-28 11:09
Messages (8)
msg56877 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-28 11:09
I always missed a getsize() method on file objects. The patch adds a method getsize() to all io instances. The method returns a SizeInfo object which can print a human readable name or the bare size in bytes. The method is using os.fstat and falls back to the seek(0,2), tell() pattern. >>> f = open("/etc/passwd") >>> f.getsize() <SizeInfo 1.7 KiB> >>> int(f.getsize()) 1721 >>> str(f.getsize()) '1.7 KiB' >>> (f.getsize().sizeinfo()) (1.681, 1) I'm going to provide unit tests and documentation if you like the feature.
msg56887 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-28 17:44
I'm skeptical: - If you add getsize, why not getlastchangeddate, getowner, getpermissions? - in general, streams (which really is the interface for file-like objects) don't have the notion of "size"; only some do. - what is the purpose of the f.tell fragment? ie. why could that work when fstat doesn't?
msg56888 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-10-28 18:14
Martin v. Löwis wrote: > I'm skeptical: > > - If you add getsize, why not getlastchangeddate, getowner, getpermissions? getowner() etc. work only with file based streams and not with memory buffers. getsize() works with every concrete class in io.py > - in general, streams (which really is the interface for file-like > objects) don't have the notion of "size"; only some do. I understand that getsize() doesn't make sense for e.g. a socket based stream. However the implementation of getsize() works with memory buffers and file descriptors > - what is the purpose of the f.tell fragment? ie. why could that work > when fstat doesn't? The tell(), seek(0,2) is a generic fall back for io instances that aren't based on a file descriptor. It's required for BytesIO and StringIO. However I could come up with an implementation for BytesIO that queries the buffer directly.
msg56928 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-29 20:49
I'm -1 myself. I've rarely needed this -- if I wanted to know the size, I was almost always going to read the data into memory anyway, so why not just read it and then ask how much you got? For files on the filesystem there's os.path.getsize(). If I ever were to let this in, here's some more criticism: (a) the SizeInfo class is overkill. getsize() should just return an int. (b) getsize() should check self.seekable() first and raise the appropriate error if the file isn't seekable. (c) os.fstat() is much less likely to work than the tell-seek-tell-seek sequence, so why not use that everywhere? (d) people will expect to use this on text files, but of course the outcome will be in bytes, hence useless.
msg57253 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-08 14:59
> (a) the SizeInfo class is overkill. getsize() should just return an int. But I like overkill :) > (b) getsize() should check self.seekable() first and raise the appropriate error if the file isn't seekable. That's easy to implement > (c) os.fstat() is much less likely to work than the tell-seek-tell-seek sequence, so why not use that everywhere? fstat doesn't have concurrency problems in multi threaded apps. I haven't profiled it but I would guess that fstat is also faster than tell seek. > (d) people will expect to use this on text files, but of course the outcome will be in bytes, hence useless. I could rename the method to getfssize, getbytesize, getsizeb ... to make clear that it doesn't return the amount of chars but the amount of used bytes.
msg57289 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-09 00:28
Sorry, I still don't like it. You'll have to come up with a darned good use case to justify this.
msg57309 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-09 15:16
Does "it's convenient and I'm too lazy to address it in my code whenever the problem arises?" count as a darn good use case? No? Mh, I thought so :)
msg57341 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-11-09 23:55
Ok, I'm rejecting it now based on the YAGNI argument Guido brought up, and based on my own concerns.
History
Date User Action Args
2022-04-11 14:56:27 admin set github: 45692
2008-01-06 22:29:45 admin set keywords: - py3kversions: Python 3.0
2007-11-09 23:55:06 loewis set status: open -> closedresolution: rejectedmessages: +
2007-11-09 15:16:36 christian.heimes set messages: +
2007-11-09 00:28:55 gvanrossum set messages: +
2007-11-08 14:59:56 christian.heimes set priority: lowkeywords: + py3k, patchmessages: +
2007-10-29 20:49:11 gvanrossum set nosy: + gvanrossummessages: +
2007-10-28 18:14:43 christian.heimes set messages: +
2007-10-28 17:44:25 loewis set nosy: + loewismessages: +
2007-10-28 11:09:43 christian.heimes create