Issue 4790: Optimization to heapq module (original) (raw)

Issue4790

Created on 2008-12-31 01:07 by nilton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
heapq_optimization.diff nilton,2008-12-31 01:07
Messages (4)
msg78584 - (view) Author: Nilton Volpato (nilton) Date: 2008-12-31 01:07
The wrapper around heapq.nlargest and heapq.nsmallest is much slower than it's C version. $ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999; x=range(n); random.shuffle(x); import _heapq' '_heapq.nlargest(3, x)' 10 loops, best of 3: 142 msec per loop $ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999; x=range(n); random.shuffle(x); import heapq' 'heapq.nlargest(3, x)' 10 loops, best of 3: 685 msec per loop If the key argument is None, there is no need to use the wrapper. This patch adds an a check to avoid this.
msg78587 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 01:48
Am rejecting the patch because it violates the sort equivalence guarantee (including sort's promise to maintain stability). If you need the speed and don't care about sort stability, then just use _heapq.nsmallest() or _heapq.nlargest() directly. We could complexify the code a bit to achieve some automatic speed-up in the case of key==None, but my timings don't show much of an improvement: if key is None: it = izip(iterable, count()) # decorate result = _nsmallest(n, it) return map(itemgetter(0), result) # undecorate else: in1, in2 = tee(iterable) it = izip(imap(key, in1), count(), in2) # decorate result = _nsmallest(n, it) return map(itemgetter(2), result) # undecorate
msg78590 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-12-31 04:34
After looking at the ugly handling of this logic in the 3.0 translation, I've reconsidered. It is better to have a separate path for the case where the key is None. See r68095 and r68096 .
msg78654 - (view) Author: Nilton Volpato (nilton) Date: 2008-12-31 21:47
Nice! Maybe we could add the decorate/undecorate step to guarantee stability to the C implementation. I'll do some experiments and timings on this. The heapq library has a lot of room for optimization, I think.
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 49040
2008-12-31 21:47:38 nilton set messages: +
2008-12-31 04:34:02 rhettinger set resolution: rejected -> acceptedmessages: +
2008-12-31 01:48:53 rhettinger set status: open -> closedassignee: rhettingermessages: + resolution: rejectednosy: + rhettinger
2008-12-31 01:07:19 nilton create