Issue 3122: sys.getsizeof() gives an AttributeError for _sre objects. (original) (raw)

Created on 2008-06-16 10:51 by schuppenies, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
_sre_sizeof.patch schuppenies,2008-06-16 10:51 Patch against 2.6 trunk, revision 64299
sizeof.patch amaury.forgeotdarc,2008-06-26 22:08
sizeof2.patch schuppenies,2008-07-03 13:31
Messages (15)
msg68266 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-16 10:51
>>> import re >>> import sys >>> r = re.compile('') >>> sys.getsizeof(r) Traceback (most recent call last): File "", line 1, in AttributeError: __sizeof__ This applies to objects of the types _sre.SRE_Pattern, _sre.SRE_Scanner, and _sre.SRE_Match. The attached patch addresses this issue.
msg68788 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-06-26 15:47
Robert, do you have a test suite for the sizeof functionality? If not, you should start one ASAP, ;) This test should be included in that suit...
msg68789 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-26 16:01
What would be a good way to identify *all* possible types? When I started, I included all objects in /Objects, but obviously this is not sufficient.
msg68801 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-26 21:10
I think it would be better to give a TypeError rather than an AttributeError for objects that don't support __sizeof__ as per other special methods.
msg68802 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-26 21:40
The attribute error is caused by pattern_getattr, which tries to find __sizeof__, fails and then sets the error message. I don't know if casting the error is the right thing to do. Actually, sys.getsizeof() should work on any type. Another suggestion was a that sys.getsizeof allows one optional argument for a default size. If the default argument is provided, sys.getsizeof will not throw an exception (if the __sizeof__ method is missing or for any other error), but instead return the given default size. Still, an agreement on the right error message is needed.
msg68804 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-26 22:08
I wondered why getsizeof fails for _sre.SRE_Pattern objects when it succeeds for _socket.socket or struct.Struct. It turns out that _sre.SRE_Pattern defines the tp_getattr slot, and this prevents attribute lookup from searching the base class (object). This is a pity, because the base implementation (which use tp_basicsize) does the right thing for many objects. So I borrowed some code from the __format__ method, and here is a patch. Now classes are not handled specially any more; the distinction is between old-style instances, and all other objects. All tests pass, but I may have missed something important...
msg68805 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-26 22:24
Robert, looking at your patch for the _sre module, I noticed that MatchObject.__sizeof__ includes the sizeof of one of its member (regs: a tuple of (begin, end) pairs). Why this one and not the others? I thought the rule (if there is a rule) was to include the used memory for members that are *not* python objects.
msg68818 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 07:12
You are right, the rule is to not include referenced objects. But, and this has been rather informal up to now, I handled transparently cached information as something that is added to the memory used by an object (see unicode.defenc). The same I applied to MatchObject.regs. The rational being that the user cannot know wether the match positions are cached or not. What do you think?
msg68819 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-27 07:52
I don't understand the relation between "the member is cached" and "it counts in the object's sizeof". What does "cached" mean? Does 'self.x = 3' create a cached member?
msg68820 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 09:05
I was refering to the fact that something on the C level is cached without the Python user ever noticing that it wasn't there before. Caching itself is no criteria, but allocating memory without giving the user a chance to find out should be (in this context). Maybe I am missing something here, but calling match.regs creates a tuple which is not there before, but cannot be removed afterwards. This is why I handled it separately.
msg68821 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-06-27 09:49
> Caching itself is no criteria, but allocating memory without giving > the user a chance to find out should be (in this context). > ... calling match.regs creates a > tuple which is not there before, but cannot be removed > afterwards. This is why I handled it separately. Well, so why only include the tuple, and not objects inside the tuple? They may also count in allocated memory (not often: small numbers are shared) Does the same criteria apply to function.func_defaults and function.doc members? Both can be None, sizeof(None) would be added twice. Would you say the same for property members? class C(object): def setx(self): self.__x = 42 x = property(lambda self: self.__x) the value is not there before you call o.setx(), and cannot be removed afterwards. IMO, the criteria (to decide whether a container should include a particular PyObject member in its sizeof) should not include the way the member is created, or who created it, but only the current layout in memory. For example: can other objects hold references to this member, does it appear in gc.objects... And I propose this simple algorithm: do not include any referenced PyObject :-)
msg68826 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-27 12:22
Okay, I get the point. With including unicode.defenc I already included a referenced object which was ruled out in the first place. And this for a good reason. What bugs me, though, is that this leaves out a potentially significant amount of memory. I know that this is already the case for shared objects (e.g. the above mentioned numbers) or malloc overhead, but adding yet another exception bothers me. On the other hand, since it's hidden behind the C API, I don't know how to address this problem. Maybe just give it some text in the documentation is sufficient.
msg69148 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-07-02 23:47
It now works for SRE objects in the py3k branch since r64672, but my patch is still needed for types that define tp_getattr.
msg69199 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-07-03 13:31
Amaury, I was testing your patch and it turns out, that it will ignore any __sizeof__ attribute which may be available through getattr. I adapted it a bit, so now getsizeof will try to call the method on the passed object first, and if it fails or the object is a type, the code proposed by you will be executed. This also deals with old-style class instances. The match_sizeof function in the patch is just to showcase the example. What do you think?
msg69767 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-07-16 07:16
Fixed in r64842.
History
Date User Action Args
2022-04-11 14:56:35 admin set github: 47372
2008-07-16 07:16:12 schuppenies set status: open -> closedresolution: fixedmessages: + keywords:patch, patch
2008-07-03 13:31:48 schuppenies set keywords:patch, patchfiles: + sizeof2.patchmessages: +
2008-07-02 23:47:41 amaury.forgeotdarc set keywords:patch, patchmessages: +
2008-06-27 12:22:09 schuppenies set keywords:patch, patchmessages: +
2008-06-27 09:49:01 amaury.forgeotdarc set keywords:patch, patchmessages: +
2008-06-27 09:05:38 schuppenies set keywords:patch, patchmessages: +
2008-06-27 07:52:24 amaury.forgeotdarc set keywords:patch, patchmessages: +
2008-06-27 07:12:14 schuppenies set keywords:patch, patchmessages: +
2008-06-26 22:24:29 amaury.forgeotdarc set keywords:patch, patchmessages: +
2008-06-26 22:08:16 amaury.forgeotdarc set keywords:patch, patchfiles: + sizeof.patchmessages: + nosy: + amaury.forgeotdarc
2008-06-26 21:40:52 schuppenies set keywords:patch, patchmessages: +
2008-06-26 21:10:33 benjamin.peterson set keywords:patch, patchnosy: + benjamin.petersonmessages: +
2008-06-26 16:01:55 schuppenies set keywords:patch, patchmessages: +
2008-06-26 15:47:08 facundobatista set keywords:patch, patchnosy: + facundobatistamessages: +
2008-06-16 10:51:25 schuppenies create