[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) (original) (raw)
Steven D'Aprano steve at pearwood.info
Sat Dec 20 11:55:26 CET 2008
- Previous message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Next message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, 20 Dec 2008 09:02:38 pm Kristján Valur Jónsson wrote:
Can you distill the program into something reproducible? Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance? I can try to point our commercial profiling tools at it and see what it is doing. K
In November 2007, a similar problem was reported on the comp.lang.python newsgroup. 370MB was large enough to demonstrate the problem. I don't know if a bug was ever reported.
The thread starts here: http://mail.python.org/pipermail/python-list/2007-November/465498.html
or if you prefer Google Groups: http://preview.tinyurl.com/97xsso
and it describes extremely long times to populate and destroy large dicts even with garbage collection turned off.
My summary at the time was:
"On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes, compared to seconds if the system only has a single CPU. Turning garbage collection off doesn't help."
I make no guarantee that the above is a correct description of the problem, only that this is what I believed at the time.
I'm afraid it is a very long thread, with multiple red herrings, lots of people unable to reproduce the problem, and the usual nonsense that happens on comp.lang.python.
I was originally one of the skeptics until I reproduced the original posters problem. I generated a sample file 8 million key/value pairs as a 370MB text file. Reading it into a dict took two and a half minutes on my relatively slow computer. But deleting the dict took more than 30 minutes even with garbage collection switched off. Sample code reproducing the problem on my machine is here:
http://mail.python.org/pipermail/python-list/2007-November/465513.html
According to this post of mine:
http://mail.python.org/pipermail/python-list/2007-November/466209.html
deleting 8 million (key, value) pairs stored as a list of tuples was very fast. It was only if they were stored as a dict that deleting it was horribly slow.
Please note that other people have tried and failed to replicate the problem. I suspect the fault (if it is one, and not human error) is specific to some combinations of Python version and hardware.
Even if this is a Will Not Fix, I'd be curious if anyone else can reproduce the problem.
Hope this is helpful,
Steven.
-----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Mike Coleman Sent: 19. desember 2008 23:30 To: python-dev at python.org Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
I have a program that creates a huge (45GB) defaultdict. (The keys are short strings, the values are short lists of pairs (string, int).) Nothing but possibly the strings and ints is shared. The program takes around 10 minutes to run, but longer than 20 minutes to exit (I gave up at that point). That is, after executing the final statement (a print), it is apparently spending a huge amount of time cleaning up before exiting. I haven't installed any exit handlers or anything like that, all files are already closed and stdout/stderr flushed, and there's nothing special going on. I have done 'gc.disable()' for performance (which is hideous without it)--I have no reason to think there are any loops. Currently I am working around this by doing an os.exit(), which is immediate, but this seems like a bit of hack. Is this something that needs fixing, or that has already been fixed? Mike
-- Steven D'Aprano
- Previous message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Next message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]