[Python-Dev] Retrieve an arbitrary element from a set without removing it (original) (raw)

Willi Richert w.richert at gmx.net
Fri Oct 23 22:53:24 CEST 2009


Hi,

surprised about the performance of for/break provided by Vitor, I did some more testing. It revealed that indeed we can forget the get() (which was implemented as a stripped down pop()):

from timeit import * stats = ["for i in xrange(1000): iter(s).next() ", "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ", "for i in xrange(1000): s.add(s.pop()) ", "for i in xrange(1000): s.get() "]

for stat in stats: t = Timer(stat, setup="s=set(range(100))") try: print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) except: t.print_exc()

$ ./test_get.py Time for for i in xrange(1000): iter(s).next() : 0.433080 Time for for i in xrange(1000): for x in s: break : 0.148695 Time for for i in xrange(1000): s.add(s.pop()) : 0.317418 Time for for i in xrange(1000): s.get() : 0.146673

In some tests, for/break was even slightly faster then get().

As always, intuition regarding performance bottlenecks is flawed ;-)

Anyway, thanks for all the helpful comments, especially to Stefan for the http://comments.gmane.org/gmane.comp.python.ideas/5606 link.

Regards, wr

Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:

Vitor Bosshard wrote: > 2009/10/23 Willi Richert <w.richert at gmx.net>: >> Hi, >> >> recently I wrote an algorithm, in which very often I had to get an >> arbitrary element from a set without removing it. >> >> Three possibilities came to mind: >> >> 1. >> x = someset.pop() >> someset.add(x) >> >> 2. >> for x in someset: >> break >> >> 3. >> x = iter(someset).next() >> >> >> Of course, the third should be the fastest. It nevertheless goes through >> all the iterator creation stuff, which costs some time. I wondered, why >> the builtin set does not provide a more direct and efficient way for >> retrieving some element without removing it. Is there any reason for >> this? >> >> I imagine something like >> >> x = someset.get() > > I see this as being useful for frozensets as well, where you can't get > an arbitrary element easily due to the obvious lack of .pop(). I ran > into this recently, when I had a frozenset that I knew had 1 element > (it was the difference between 2 other sets), but couldn't get to that > element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next() was a function call overhead, while: for x in someset: break Was evaluated inline. It probably still has to call PyObjectGetIter, however it doesn't have to create a stack frame for it. This is what "timeit" tells me. All runs are of the form: python -m timeit -s "s = set([10])" ... 0.101us "for x in s: break; x" 0.130us "for x in s: pass; x" 0.234us -s "n = next; i = iter" "x = n(i(s)); x" 0.248us "x = next(iter(s)); x" 0.341us "x = iter(s).next(); x" So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement? Note that all of these are significantly < 1us. So this only matters if_ _it is something you are doing often._ _I don't know your specific timings, but I would guess that:_ _for x in s: break_ _Is actually going to be faster than your_ _s.get()_ _Primarily because s.get() requires an attribute lookup. I would think_ _your version might be faster for:_ _stat2 = "g = s.get; for i in xrange(100): g()"_ _However, that is still a function call, which may be treated differently_ _by the interpreter than the for:break loop. I certainly suggest you try_ _it and compare._ _John_ _=:-> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20091023/755690d7/attachment-0001.htm>



More information about the Python-Dev mailing list