Issue 2672: speed of set.update( (original) (raw)

Created on 2008-04-23 02:44 by jameinel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg65694 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-23 02:44
I was performance profiling some of my own code, and I ran into something unexpected. Specifically, set.update(empty_generator_expression) was significantly slower than set.update(empty_list_expression). I double checked my findings with timeit: With python 2.4.3: $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.296 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.837 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.462 usec per loop With 2.5.1 (on a different machine) $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.265 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.717 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.39 usec per loop So generally, it is about 2x faster to create the empty list expression and pass it in than to use an empty generator.
msg65704 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-23 20:19
This has nothing to do with set.update, the difference is due to the time to setup the generator: $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.38 usec per loop $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 1000000 loops, best of 3: 0.335 usec per loop
msg65705 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-23 20:38
I concur. The source code for set_update() in Objects/setobject.c shows that both versions are accessed through the iterator protocol, so what you're seeing are the basic performance differences (start-up and per-item costs) for genexps vs list iterators.
msg65737 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-24 18:23
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alexander Belopolsky wrote: > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > This has nothing to do with set.update, the difference is due to the > time to setup the generator: > > $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' > 1000000 loops, best of 3: 0.38 usec per loop > $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' > 'x.update(y)' > 1000000 loops, best of 3: 0.335 usec per loop > > ---------- > nosy: +belopolsky > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > That is true, though if I just force a generator overhead: % python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.204 usec per loop % python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 10000000 loops, best of 3: 0.173 usec per loop % python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i in l)' 1000000 loops, best of 3: 0.662 usec per loop python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)' '(i for i in l); x.update(y)' 1000000 loops, best of 3: 1.87 usec per loop So if you compare consuming a generator multiple times to creating it each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. (I'm certainly willing to believe that set.update() is generator/list agnostic, but something weird is still happening.) John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ l+6+bl7oAJMhdJ70viqicnQ= =pLX6 -----END PGP SIGNATURE-----
msg65742 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-24 19:29
On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel <report@bugs.python.org> wrote: .. > So if you compare consuming a generator multiple times to creating it > each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. > > So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. > > (I'm certainly willing to believe that set.update() is generator/list > agnostic, but something weird is still happening.) I've seen a similar strangeness in timings: $ python -m timeit '(i for i in [])' 100000 loops, best of 3: 4.16 usec per loop but $ python -m timeit -s 'x = set()' 'x.update(i for i in [])' 1000000 loops, best of 3: 1.31 usec per loop on the other hand, $ python -m timeit -s 'x = []' 'x.extend(i for i in [])' 100000 loops, best of 3: 4.54 usec per loop How can x.update(i for i in []) take *less* time than simply creating a genexp? Note that there is no apparent bytecode tricks here: 1 0 LOAD_CONST 0 (<code object at 0xf7e88920, file "", line 1>) 3 MAKE_FUNCTION 0 6 BUILD_LIST 0 9 GET_ITER 10 CALL_FUNCTION 1 13 RETURN_VALUE >>> dis(lambda:x.update(i for i in [])) 1 0 LOAD_GLOBAL 0 (x) 3 LOAD_ATTR 1 (update) 6 LOAD_CONST 0 (<code object at 0xf7e88920, file "", line 1>) 9 MAKE_FUNCTION 0 12 BUILD_LIST 0 15 GET_ITER 16 CALL_FUNCTION 1 19 CALL_FUNCTION 1 22 RETURN_VALUE
msg65752 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-24 21:24
John, when y=[], the update method has to create a new list iterator on each invocation. But when y is a genexp, it is self-iterable (iow, iter (y) will return self, not a new object). Also, when doing timings, it can be helpful to factor-out the attribute lookup: python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)'
msg65753 - (view) Author: John Arbash Meinel (jameinel) Date: 2008-04-24 21:39
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Raymond Hettinger wrote: > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > John, when y=[], the update method has to create a new list iterator on > each invocation. But when y is a genexp, it is self-iterable (iow, iter > (y) will return self, not a new object). > > Also, when doing timings, it can be helpful to factor-out the attribute > lookup: > > python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])" What I was surprised at is the time for: "(i for i in [])" being about 4x longer than "set.update(i for i in [])" Anyway, the original issue is probably closed, whether we want to track into the generator stuff or not is probably a different issue. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh cEVLEP5Hu3vpeVgVYdTbAVc= =94ja -----END PGP SIGNATURE-----
History
Date User Action Args
2022-04-11 14:56:33 admin set github: 46924
2008-04-24 21:39:35 jameinel set messages: +
2008-04-24 21:24:48 rhettinger set messages: +
2008-04-24 19:29:34 belopolsky set messages: +
2008-04-24 18:23:47 jameinel set messages: + title: speed of set.update([]) -> speed of set.update(
2008-04-23 20:38:27 rhettinger set status: open -> closedresolution: not a bugmessages: +
2008-04-23 20:19:54 belopolsky set nosy: + belopolskymessages: +
2008-04-23 03:59:35 rhettinger set assignee: rhettingernosy: + rhettinger
2008-04-23 02:44:55 jameinel set type: performance
2008-04-23 02:44:42 jameinel create