[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue Dec 23 13:47:15 CET 2008


On 2008-12-22 22:45, Steven D'Aprano wrote:

On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:

On 2008-12-20 23:16, Martin v. Löwis wrote:

I will try next week to see if I can come up with a smaller, submittable example. Thanks. These long exit times are usually caused by the garbage collection of objects. This can be a very time consuming task. I doubt that. The long exit times are usually caused by a bad malloc implementation. With "garbage collection" I meant the process of PyDECREF'ing the objects in large containers or deeply nested structures, not the GC mechanism for breaking circular references in Python. This will usually also involve free() calls, so the malloc implementation affects this as well. However, I've seen such long exit times on Linux and Windows, which both have rather good malloc implementations. I don't think there's anything much we can do about it at the interpreter level. Deleting millions of objects takes time and that's not really surprising at all. It takes even longer if you have instances with .del() methods written in Python. This behaviour appears to be specific to deleting dicts, not deleting random objects. I haven't yet confirmed that the problem still exists in trunk (I hope to have time tonight or tomorrow), but in my previous tests deleting millions of items stored in a list of tuples completed in a minute or two, while deleting the same items stored as key:item pairs in a dict took 30+ minutes. I say plus because I never had the patience to let it run to completion, it could have been hours for all I know.

That's interesting. The dictionary dealloc routine doesn't give any hint as to why this should take longer than deallocating a list of tuples.

However, due to the way dictionary tables are allocated, it is possible that you create a table that is nearly twice the size of the actual number of items needed by the dictionary. At those dictionary size, this can result in a lot of extra memory being allocated, certainly more than the corresponding list of tuples would use.

Applications can choose other mechanisms for speeding up the exit process in various (less clean) ways, if they have a need for this.

BTW: Rather than using a huge in-memory dict, I'd suggest to either use an on-disk dictionary such as the ones found in mxBeeBase or a database. The original poster's application uses 45GB of data. In my earlier tests, I've experienced the problem with ~ 300 megabytes of data: hardly what I would call "huge".

Times have changed, that's true :-)

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Dec 23 2008)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/



More information about the Python-Dev mailing list