Issue 2898: Add memory footprint query (original) (raw)

Issue2898

Created on 2008-05-17 10:44 by schuppenies, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
footprint.patch schuppenies,2008-05-17 10:44 Patch against 2.6 trunk, revision 63363
sizeof.patch schuppenies,2008-05-29 13:09 Patch against 2.6 trunk, revision 63363
Messages (21)
msg66989 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-17 10:44
I propose a patch which allows to query the memory footprint of an object. Calling 'footprint(o)', a python developer can retrieve the size of any python object. Only the size of the object itself will be returned, the size of any referenced objects will be ignored. The patch implements a generic function to compute the object size. This works in most, but a few cases. One of these exceptions is the dictionary with its particular table implementation. Such cases can be handled by implementing an optional method in C. This would also be the case for third-party implementations with unusual type definitions. One advantage with this approach is that the object size can be computed at the level an object is allocated, not requiring complex computations and considerations on higher levels. I am not completely happy with the name 'footprint', but think using 'sizeof' would be confused with plain 'size', and 'memory_usage' was somewhat too long to be typed conveniently. Current test pass on linux32 and linux64, but the test suite is not complete, yet. This patch is part of my Google Summer of Code project on Python memory profiling (http://code.google.com/soc/2008/psf/appinfo.html?csaid=13F0E9C8B6E064EF). Also, this is my first patch, so please let me know where missed something, did not follow coding conventions, or made wrong assumptions.
msg66990 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 10:49
Can't you write this as a simple Python function using type.__basicsize__ and type.__itemsize__? In any case, if this is added somewhere it should not be a builtin. This operation is nowhere near the usefulness to be one.
msg66991 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-17 11:00
> Can't you write this as a simple Python function using > type.__basicsize__ and type.__itemsize__? Yes, it would be possible and has been done, e.g. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/546530. The problem is though, that it requires handling of all special cases externally. Any changes need to be addressed separately and unknown type definitions cannot be addressed at all. Also I figured the programmer implementing a type would know best about its size. Another point is different architectures which result in different object sizes. > In any case, if this is added somewhere it should not be a builtin. What place would you consider to be appropriate?
msg66992 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 11:02
Such implementation-specific things usually went into the sys module.
msg66994 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 13:46
It's actually not possible, in general, to compute the memory consumption of an object using basicsize and itemsize. An example is the dictionary, where there is no way to find out how many slots are currently allocated. Even for the things such as lists where the formula basicsize+len*itemsize would be correct it may fail, e.g. a list reports its itemsize as zero, even though each list item consumes four bytes (on a 32-bit system). I don't really see a problem with calling it sizeof, so I would then propose sys.sizeof as the appropriate location.
msg66995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-17 13:54
Proposals like this have been rejected in the past. Memory consumption is an evasive concept. Lists over-allocate space, there are freelists, there are immortal objects, the python memory allocator may hang-on to space thought to be available, the packing and alignment of structures varies across implementations, the system memory allocator may assign much larger chunks than are needed for a single object, and the memory may not be freed back to the system. Because of these issues, it is not that meaningful to say the object x consumes y bytes.
msg66996 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 14:10
> Proposals like this have been rejected in the past. Memory consumption > is an evasive concept. Lists over-allocate space That issue is addressed in this patch. > there are freelists, but they allocate just an upper bound. > there are immortal objects, the python memory allocator may hang-on to > space thought to be available These issues are orthogonal to the memory consumption of a single object. > the packing and alignment of structures > varies across implementations This is addressed in the current patch. > the system memory allocator may assign > much larger chunks than are needed for a single object While true in general, this is not true in practice - in particular, when objects get allocated through pymalloc. > and the memory > may not be freed back to the system. Because of these issues, it is > not that meaningful to say the object x consumes y bytes. This is not true. It is meaningful to say that (and many that you noted are independent from such a statement, as they say things for the whole interpreter, not an individual object). The patch meets a real need, and is the minimum amount of code that actually *has* to be implemented in the virtual machine, to get a reasonable analysis of the total memory consumption. Please be practical here, not puristic.
msg67009 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 18:52
Lists will need a custom tp_footprint then, too. Or, if we call it sizeof, the slot should be tp_sizeof. BTW, is a new slot necessary, or can it just be a type method called __sizeof__?
msg67011 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 19:18
> Lists will need a custom tp_footprint then, too. True. > BTW, is a new slot necessary, or > can it just be a type method called __sizeof__? It wouldn't be a type method, but a regular method on the specific type, right? I think that would work as well.
msg67016 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-17 21:04
Guido, recently you've been opposed to adding more slots. Any opinions on this one? Also, is this something you want an additional builtin for?
msg67063 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-05-19 15:09
I'm torn about the extra slot; I'd rather not add one, but I can't see how to make this flexible enough without one. It should definitely not be a built-in; the sys module is fine though (e.g. sys.getrefcount() lives there too).
msg67075 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-19 20:48
> I'm torn about the extra slot; I'd rather not add one, but I can't see > how to make this flexible enough without one. I think adding a default __sizeof__ implementation into object (__basicsize__ + len()*__itemsize__), plus overriding that in subclasses, should do the trick. Not adding the default into object would cause an exception to be raised whenever sys.sizeof checks for __sizeof__, which is fairly expensive. Having to look __sizeof__ up in the class dictionary, and creating an argument list, is still fairly expensive (given that the application we have in mind will apply sizeof to all objects, repeatedly), however, this being a debugging facility, this overhead is probably ok.
msg67438 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-28 07:35
I tried to implement a magic method __sizeof__() for the type object which should be callable for type objects and type itself. But calling __sizeof__ results in an error message >>> type.__sizeof__() Traceback (most recent call last): File "", line 1, in TypeError: descriptor '__sizeof__' of 'type' object needs an argument Debugging it I found that type_getattro will (1) look for the attribute in the metatype, (2) look in tp_dict of this type, and (3) use the descriptor from the metatype. I actually want it to perform (3), but since type is its own metatype (2) will be triggered. This then results in the need for an argument. The same behavior occurs for all type instances, i.e. classes. Is my understanding correct? How would it be possible to invoke __sizeof__() on the type 'type' and not on the object 'type'? My first approach did the same for object, that is a magic __sizeof__() method linked to object, but it gets ignored when invoked on classes or types. Now from my understanding everything is an object, thus also classes and types. isinstance seems to agree with me >>> >>> isinstance(int, object) True Any suggestions on that? thanks, robert
msg67440 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-28 07:48
You probably just need to make the method a class method -- see METH_CLASS.
msg67442 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-28 08:13
thanks, that did the trick.
msg67481 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-29 08:11
The attached patch implements the sizeof functionality as a sys module function. __sizeof__ is implemented by object as a instance method, by type as a class method as well as by types which's size cannot be computed from basicsize, itemsize and ob_size. sys.getsizeof() has some work-arounds to deal with type instances and old-style classes.
msg67489 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-29 13:09
Nick Coghlan helped me to clear my 'metaclass confusion' so here is a patch without an additional __sizeof__ for type objects.
msg67570 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-31 10:06
The patch looks fine to me, please apply. Don't forget to add a Misc/NEWS entry.
msg67595 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-01 16:22
Applied in r63856.
msg68372 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-06-18 19:49
Three questions on the sizeof.patch: 1) In the first line of function dict_sizeof() + res = sizeof(PyDictObject) + sizeof(mp->ma_table); is the sizeof(mp->ma_table) counted twice? 2) Since functions list_sizeof and dict_sizeof return the allocated size, including the over-allocation, should function string_sizeof not include the sentinel null character? 3) Are tuples left out on purpose? If not, here is an implementation for Objects/tupleobject.c: .... static PyObject * tuple_sizeof(PyTupleObject *v) { Py_ssize_t res; res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) * sizeof(void*); return PyInt_FromSsize_t(res); } PyDoc_STRVAR(sizeof_doc, "T.__sizeof__() -- size of T in bytes"); .... static PyMethodDef tuple_methods[] = { {"__getnewargs__", (PyCFunction)tuple_getnewargs, METH_NOARGS}, {"__sizeof__", (PyCFunction)tuple_sizeof, METH_NOARGS, sizeof_doc}, .... /Jean Brouwers
msg68377 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-18 22:10
Jean Brouwers wrote: > 1) In the first line of function dict_sizeof() > + res = sizeof(PyDictObject) + sizeof(mp->ma_table); > is the sizeof(mp->ma_table) counted twice? Yes, you are right. I'll fix this. > 2) Since functions list_sizeof and dict_sizeof return the allocated > size, including the over-allocation, should function string_sizeof not > include the sentinel null character? Isn't this addressed by taking PyStringObject.ob_sval into account? It is allocated with 1 char length and thus always included. If I understand the creation of strings correctly, the corresponding memory is always allocated with PyObject_MALLOC(sizeof(PyStringObject) + size) which should mean that the space for the null terminating character is included in the sizeof(PyStringObject). > > > 3) Are tuples left out on purpose? No, that slipped the initial patch. I corrected in r64230. > .... > static PyObject * > tuple_sizeof(PyTupleObject *v) > { > Py_ssize_t res; > > res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) * > sizeof(void*); > return PyInt_FromSsize_t(res); > } > .... Your implementation is like the applied changes from me, with one difference. The basicsize of a tuple is defined as "sizeof(PyTupleObject) - sizeof(PyObject *)" When a tuple's memory is allocated, the required space is computed roughly like this (typeobj)->tp_basicsize + (nitems)*(typeobj)->tp_itemsize Thus, I understand the memory allocated by a tuple to be res = PyTuple_Type.tp_basicsize + Py_SIZE(v) * sizeof(PyObject *);
History
Date User Action Args
2022-04-11 14:56:34 admin set github: 47147
2008-10-13 19:48:15 jcea set nosy:gvanrossum, loewis, georg.brandl, rhettinger, facundobatista, jcea, MrJean1, schuppenies
2008-06-18 22:10:51 schuppenies set messages: +
2008-06-18 19:49:19 MrJean1 set nosy: + MrJean1messages: +
2008-06-01 16:22:53 schuppenies set status: open -> closedmessages: +
2008-05-31 10:07:04 loewis set assignee: gvanrossum -> schuppeniesresolution: acceptedmessages: +
2008-05-29 13:09:45 schuppenies set files: + sizeof.patchmessages: +
2008-05-29 13:08:31 schuppenies set files: - sizeof.patch
2008-05-29 08:11:26 schuppenies set files: + sizeof.patchmessages: +
2008-05-28 16:45:12 jcea set nosy: + jcea
2008-05-28 08:13:54 schuppenies set messages: +
2008-05-28 07:48:16 georg.brandl set messages: +
2008-05-28 07:35:57 schuppenies set messages: +
2008-05-21 01:48:07 facundobatista set nosy: + facundobatista
2008-05-19 20:48:11 loewis set messages: +
2008-05-19 15:09:37 gvanrossum set messages: +
2008-05-17 21:04:17 rhettinger set assignee: gvanrossummessages: + nosy: + gvanrossum
2008-05-17 19🔞13 loewis set messages: +
2008-05-17 18:52:30 georg.brandl set messages: +
2008-05-17 14:11:32 loewis set messages: +
2008-05-17 13:55:04 rhettinger set nosy: + rhettingermessages: +
2008-05-17 13:46:48 loewis set nosy: + loewismessages: +
2008-05-17 11:02:43 georg.brandl set messages: +
2008-05-17 11:00:27 schuppenies set messages: +
2008-05-17 10:50:22 georg.brandl set nosy: + georg.brandlmessages: +
2008-05-17 10:44:29 schuppenies create