The wrapper around heapq.nlargest and heapq.nsmallest is much slower than it's C version. $ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999; x=range(n); random.shuffle(x); import _heapq' '_heapq.nlargest(3, x)' 10 loops, best of 3: 142 msec per loop $ python2.5 -mtimeit -s 'import random; random.seed(123); n=999999; x=range(n); random.shuffle(x); import heapq' 'heapq.nlargest(3, x)' 10 loops, best of 3: 685 msec per loop If the key argument is None, there is no need to use the wrapper. This patch adds an a check to avoid this.
Am rejecting the patch because it violates the sort equivalence guarantee (including sort's promise to maintain stability). If you need the speed and don't care about sort stability, then just use _heapq.nsmallest() or _heapq.nlargest() directly. We could complexify the code a bit to achieve some automatic speed-up in the case of key==None, but my timings don't show much of an improvement: if key is None: it = izip(iterable, count()) # decorate result = _nsmallest(n, it) return map(itemgetter(0), result) # undecorate else: in1, in2 = tee(iterable) it = izip(imap(key, in1), count(), in2) # decorate result = _nsmallest(n, it) return map(itemgetter(2), result) # undecorate
After looking at the ugly handling of this logic in the 3.0 translation, I've reconsidered. It is better to have a separate path for the case where the key is None. See r68095 and r68096 .
Nice! Maybe we could add the decorate/undecorate step to guarantee stability to the C implementation. I'll do some experiments and timings on this. The heapq library has a lot of room for optimization, I think.
History
Date
User
Action
Args
2022-04-11 14:56:43
admin
set
github: 49040
2008-12-31 21:47:38
nilton
set
messages: +
2008-12-31 04:34:02
rhettinger
set
resolution: rejected -> acceptedmessages: +
2008-12-31 01:48:53
rhettinger
set
status: open -> closedassignee: rhettingermessages: + resolution: rejectednosy: + rhettinger