[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Tue Dec 23 01:34:53 CET 2008
- Previous message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Next message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Now, we should find a way to benchmark this without having to steal Mike's machine and wait 30 minutes every time.
So, I seem to reproduce it. The following script takes about 15 seconds to run and allocates a 2 GB dict which it deletes at the end (gc disabled of course). With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher (including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular but the difference is clear.
Also, after the dict is deleted and before the program exits, you can witness
(with ps
or top
) that 2.5 and higher has reclaimed 1GB, while 2.4 has
reclaimed nothing. There is a sleep() call at the end so that you have the
time :-)
You can tune memory occupation at the beginning of the script, but the lower the more difficult it will be to witness a difference.
Regards
Antoine.
#######
import random import time import gc import itertools
Adjust this parameter according to your system RAM!
target_size = int(2.0 * 1024**3) # 2.0 GB
pool_size = 4 * 1024
This is a ballpark estimate: 60 bytes overhead for each
{ dict entry struct + float object + tuple object header },
1.3 overallocation factor for the dict.
target_length = int(target_size / (1.3 * (pool_size + 60)))
def make_dict(): print ("filling dict up to %d entries..." % target_length)
# 1. Initialize the dict from a set of pre-computed random keys.
keys = [random.random() for i in range(target_length)]
d = dict.fromkeys(keys)
# 2. Build the values that will constitute the dict. Each value will, as
# far as possible, span a contiguous `pool_size` memory area.
# Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc()
# We avoid that by allocating tuples of smaller longs.
int_size = 200
# 24 roughly accounts for the long object overhead (YMMV)
int_start = 1 << ((int_size - 24) * 8 - 7)
int_range = range(1, 1 + pool_size // int_size)
values = [None] * target_length
# Maximize allocation locality by pre-allocating the values
for n in range(target_length):
values[n] = tuple(int_start + j for j in int_range)
if n % 10000 == 0:
print (" %d iterations" % n)
# The keys are iterated over in their original order rather than in
# dict order, so as to randomly spread the values in the internal dict
# table wrt. allocation address.
for n, k in enumerate(keys):
d[k] = values[n]
print ("dict filled!")
return d
if name == "main": gc.disable() t1 = time.time() d = make_dict() t2 = time.time() print (" -> %.3f s." % (t2 - t1)) print ("deleting dict...") t2 = time.time() del d t3 = time.time() print (" -> %.3f s." % (t3 - t2)) print ("Finished, you can press Ctrl+C.") time.sleep(10.0)
- Previous message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Next message: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]