msg19147 - (view) |
Author: Gottfried Ganßauge (ganssauge) |
Date: 2003-11-26 14:06 |
My application uses a shelve-file which is created by another process using the same python version. Before python2.3 using this shelve with the exact same application was almost twice as fast as a binary pickle containing the same data. Now with python2.3 the same application is suddenly about 150 times slower than using the binary pickle. The usage is as follows: idx_dict = shelve.open (idx_dict_name, "r") ... while not infile.eof: index = get_index_from_somewhere_else() if not idx_dict.has_key (index): do_something(index) else: do_something_else(index) idx.dict.close() Profiling revealed that most of the time is spent within userdict. |
|
|
msg19148 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2003-11-27 09:17 |
Logged In: YES user_id=80475 I can reproduce a four-fold slowdown that persists even after the UserDict.DictMixin lines are commented out of shelve.py and bsddb.__init__.py. For me, the only thing that has changed is the underlying bsddb implementation. Let's see if you system is going somewhere else to get its shelving done. After the first line, add: idx_dict.has_key ([]) Then post the traceback here. Do that for both Py2.2 and for Py2.3. Thank you. Also, post what a typical record in the index and tell me how many entries are typically in idx_dict. That way, I can try to reproduce your timings with greater fidelity. Which os are you using and what the minor bugfix verion numbers of the Py2.2 and PY2.3 you are using. |
|
|
msg19149 - (view) |
Author: Gottfried Ganßauge (ganssauge) |
Date: 2003-11-27 10:32 |
Logged In: YES user_id=792746 I uploaded my profiling data, maybe it will help you ... Here is the information you requested: ----------------><------------------------><------------ (gotti@gglinux 534) PYTHONPATH=../../../COMMON.DEVEL/Tools/python/lib.linux- i686-2.3 python Konvertierung/entsch_pass2.py HI69228 x HR all_idx2.shelve <hi69228.sgml Traceback (most recent call last): File "Konvertierung/entsch_pass2.py", line 1026, in ? init_idx_dict (idx_dict_name) File "../../COMMON/lib/EDB.py", line 54, in init_idx_dict idx_dict.has_key([]) File "/usr/lib/python2.3/shelve.py", line 104, in has_key return self.dict.has_key(key) File "/usr/lib/python2.3/bsddb/__init__.py", line 142, in has_key return self.db.has_key(key) TypeError: String or Integer object expected for key, list found (gotti@gglinux 535) PYTHONPATH=../../../COMMON.DEVEL/Tools/python/lib.linux- i686-2.2 python2.2 Konvertierung/entsch_pass2.py HI69228 x HR all_idx2.shelve <hi69228.sgml Traceback (most recent call last): File "Konvertierung/entsch_pass2.py", line 1026, in ? init_idx_dict (idx_dict_name) File "../../COMMON/lib/EDB.py", line 54, in init_idx_dict idx_dict.has_key([]) File "/usr/lib/python2.2/shelve.py", line 62, in has_key return self.dict.has_key(key) TypeError: key type must be string (gotti@gglinux 536) python -V Python 2.3.2 (gotti@gglinux 537) python2.2 -V Python 2.2.3 (gotti@gglinux 538) uname -a Linux gglinux 2.4.22 #1 SMP Mon Nov 3 11:40:28 CET 2003 i686 unknown unknown GNU/Linux (gotti@gglinux 538) cat /etc/debian_version testing/unstable (gotti@gglinux 539) python2.2 -c 'import shelve ; d = shelve.open("all_idx2.shelve", "r"); print len (d.keys()) ; print d.keys()[0], d [d.keys()[0]]' 34983 HI568817 None (gotti@gglinux 540) python2.3 -c 'import shelve ; d = shelve.open("all_idx2.shelve", "r"); print "# items in shelve:", len (d.keys()) ; print "Items look like: index", d.keys() [0], "value", d [d.keys()[0]]' # items in shelve: 34983 Items look like: index HI568817 value None |
|
|
msg19150 - (view) |
Author: Gottfried Ganßauge (ganssauge) |
Date: 2003-11-27 10:42 |
Logged In: YES user_id=792746 What the heck ... here is the shelve in question |
|
|
msg19151 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2003-11-27 17:55 |
Logged In: YES user_id=80475 The fragment in the original posting showed the only inner-loop shelve access was through has_key(). The tracebacks show that UserDict is nowhere in the traceback chain. I conclude that the fragment does not represent what is really going on in the problematic script. So, please attach the profiled script, Konvertierung/entsch_pass2.py The attached profile indicates that somewhere, there is a line like: for k,v in idx_dict.iteritems(). This is surprising because shelves did not support iteritems() in Py2.2. That would be mean that you've timed and compared two different pieces of code. Please show the shortest script with data that runs at radically different speeds on Py2.2 vs Py2.3. |
|
|
msg19152 - (view) |
Author: Gottfried Ganßauge (ganssauge) |
Date: 2003-11-28 16:01 |
Logged In: YES user_id=792746 I think I found the answer: apart from has_key() I'm using "dict != None". If I leave that out in my test program both python variants run with the same speed. The dict != None condition seems to trigger len(dict.keys()) and that seems to be way slower than before. I definitely didn't time different scripts: the script is part of our CDROM production system and the only variables I had during my tests were python itself and the python path. Find my test script attached... |
|
|
msg19153 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2003-11-28 21:57 |
Logged In: YES user_id=80475 Yes, that was the culprit. I'll look for a way to make __cmp__ a bit smarter. In the meantime, the proper way to check for None is always: if dict is None. |
|
|
msg19154 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2003-12-07 11:55 |
Logged In: YES user_id=80475 I fixed-up your particular problem for Py2.3.3 and Py2.4. Leaving the report open because there are other calls which have performance issues. |
|
|
msg55408 - (view) |
Author: Skip Montanaro (skip.montanaro) *  |
Date: 2007-08-29 01:57 |
Raymond - can we close this ticket? |
|
|
msg110108 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2010-07-12 16:30 |
Raymond - can we close this ticket? |
|
|