Issue 14520: Buggy Decimal.sizeof - Python tracker (original) (raw)

Created on 2012-04-07 10:55 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (10)
msg157726 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-07 10:55
I'm not sure __sizeof__ is implemented correctly: >>> from decimal import Decimal >>> import sys >>> d = Decimal(123456789123456798123456789123456798123456789123456798) >>> d Decimal('123456789123456798123456789123456798123456789123456798') >>> sys.getsizeof(d) 24 ... looks too small.
msg157729 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-07 11:29
It isn't implemented at all. The Python version also always returns 96, irrespective of the coefficient length. Well, arguably the coefficient is a separate object in the Python version: 96 >>> sys.getsizeof(d._int) 212 For the C version I'll do the same as in longobject.c.
msg157730 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-07 11:32
In full: >>> d = Decimal(100000000000000000000000000000000000000000000000000000000000000000000) >>> sys.getsizeof(d) 96 >>> sys.getsizeof(d._int) 212
msg157798 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-08 17:30
There are really two options: a) if an object is a container, and the contained is accessible to reflection (preferably through gc.get_referents), then the container shouldn't account for the size of the contained. b) if the contained is not accessible (except for sys.get_objects() in a debug build), then the container should provide the total sum. A memory debugger is supposed to find all objects (e.g. through gc.get_objects, and gc.get_referents), eliminate duplicate references, and then apply sys.getsizeof for each object. This should then not leave out any memory, and not count any memory twice.
msg157857 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-09 16:14
In the C version of decimal, do distinct Decimal objects ever share coefficients? (This would be an obvious optimization for methods like Decimal.copy_negate; I don't know whether the C version applies such optimizations.) If there's potential for shared coefficients, that might make the "not count any memory twice" part tricky.
msg157860 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-09 16:30
> In the C version of decimal, do distinct Decimal objects ever share > coefficients? (This would be an obvious optimization for methods > like Decimal.copy_negate; I don't know whether the C version > applies such optimizations.) If there's potential for shared > coefficients, that might make the "not count any memory twice" part > tricky. I know of three strategies to deal with such a case: a) expose the inner objects, preferably through tp_traverse, and don't account for them in the container, b) find a "canonical" owner of the contained objects, and only account for them along with the canonical container. c) compute the number N of shared owners, and divide the object size by N. Due to rounding, this may be somewhat incorrect.
msg157861 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-09 16:54
Mark Dickinson <report@bugs.python.org> wrote: > In the C version of decimal, do distinct Decimal objects ever share coefficients? The coefficients are members of the mpd_t struct (libmpdec data type), and they are not exposed as Python objects or shared. Cache locality is incredibly important: I have a patch that reserves a static coefficient of four words inside the PyDecObject. This patch speeds up _decimal by roughly another 30-40% for regularly sized decimals. If the decimal grows beyond that, libmpdec automatically switches to a dynamically allocated coefficient. I think sharing would probably slow things down a bit.
msg157864 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-04-09 17:14
> and they are not exposed as Python objects or shared. Okay, thanks. Sounds like this isn't an issue at the moment then. +1 for having getsizeof report the total size used.
msg157886 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-04-09 19:33
New changeset 010aa5d955ac by Stefan Krah in branch 'default': Issue #14520: Add __sizeof__() method to the Decimal object. http://hg.python.org/cpython/rev/010aa5d955ac
msg157889 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-04-09 19:51
Thanks for the explanations. The new __sizeof__() method should now report the exact memory usage.
History
Date User Action Args
2022-04-11 14:57:28 admin set github: 58725
2012-04-09 19:51:51 skrah set status: open -> closedresolution: fixedmessages: + stage: resolved
2012-04-09 19:33:35 python-dev set nosy: + python-devmessages: +
2012-04-09 17:14:18 mark.dickinson set messages: +
2012-04-09 16:54:38 skrah set messages: +
2012-04-09 16:30:24 loewis set messages: +
2012-04-09 16:14:27 mark.dickinson set messages: +
2012-04-09 16:09:11 mark.dickinson set nosy: + mark.dickinson
2012-04-08 17:30:56 loewis set nosy: + loewismessages: +
2012-04-07 11:32:04 skrah set messages: +
2012-04-07 11:29:10 skrah set messages: +
2012-04-07 10:55:38 pitrou create