msg65694 - (view) |
Author: John Arbash Meinel (jameinel) |
Date: 2008-04-23 02:44 |
I was performance profiling some of my own code, and I ran into something unexpected. Specifically, set.update(empty_generator_expression) was significantly slower than set.update(empty_list_expression). I double checked my findings with timeit: With python 2.4.3: $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.296 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.837 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.462 usec per loop With 2.5.1 (on a different machine) $ python -m timeit -s 'x = set(range(10000))' 'x.update([])' 1000000 loops, best of 3: 0.265 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update(y for y in [])' 1000000 loops, best of 3: 0.717 usec per loop $ python -m timeit -s 'x = set(range(10000))' 'x.update([y for y in []])' 1000000 loops, best of 3: 0.39 usec per loop So generally, it is about 2x faster to create the empty list expression and pass it in than to use an empty generator. |
|
|
msg65704 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-04-23 20:19 |
This has nothing to do with set.update, the difference is due to the time to setup the generator: $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.38 usec per loop $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 1000000 loops, best of 3: 0.335 usec per loop |
|
|
msg65705 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-04-23 20:38 |
I concur. The source code for set_update() in Objects/setobject.c shows that both versions are accessed through the iterator protocol, so what you're seeing are the basic performance differences (start-up and per-item costs) for genexps vs list iterators. |
|
|
msg65737 - (view) |
Author: John Arbash Meinel (jameinel) |
Date: 2008-04-24 18:23 |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alexander Belopolsky wrote: > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > This has nothing to do with set.update, the difference is due to the > time to setup the generator: > > $ python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' > 1000000 loops, best of 3: 0.38 usec per loop > $ python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' > 'x.update(y)' > 1000000 loops, best of 3: 0.335 usec per loop > > ---------- > nosy: +belopolsky > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > That is true, though if I just force a generator overhead: % python -m timeit -s 'x = set(range(10000)); y = []' 'x.update(y)' 1000000 loops, best of 3: 0.204 usec per loop % python -m timeit -s 'x = set(range(10000)); y = (i for i in [])' 'x.update(y)' 10000000 loops, best of 3: 0.173 usec per loop % python -m timeit -s 'x = set(range(10000)); l = []' 'x.update(i for i in l)' 1000000 loops, best of 3: 0.662 usec per loop python -m timeit -s 'x = set(range(10000)); l = []; y = (i for i in l)' '(i for i in l); x.update(y)' 1000000 loops, best of 3: 1.87 usec per loop So if you compare consuming a generator multiple times to creating it each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. (I'm certainly willing to believe that set.update() is generator/list agnostic, but something weird is still happening.) John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ l+6+bl7oAJMhdJ70viqicnQ= =pLX6 -----END PGP SIGNATURE----- |
|
|
msg65742 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-04-24 19:29 |
On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel <report@bugs.python.org> wrote: .. > So if you compare consuming a generator multiple times to creating it > each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator. > > So why does: "(i for i in l); x.update(y)" take an additional 1.208 usec. > > (I'm certainly willing to believe that set.update() is generator/list > agnostic, but something weird is still happening.) I've seen a similar strangeness in timings: $ python -m timeit '(i for i in [])' 100000 loops, best of 3: 4.16 usec per loop but $ python -m timeit -s 'x = set()' 'x.update(i for i in [])' 1000000 loops, best of 3: 1.31 usec per loop on the other hand, $ python -m timeit -s 'x = []' 'x.extend(i for i in [])' 100000 loops, best of 3: 4.54 usec per loop How can x.update(i for i in []) take *less* time than simply creating a genexp? Note that there is no apparent bytecode tricks here: 1 0 LOAD_CONST 0 (<code object at 0xf7e88920, file "", line 1>) 3 MAKE_FUNCTION 0 6 BUILD_LIST 0 9 GET_ITER 10 CALL_FUNCTION 1 13 RETURN_VALUE >>> dis(lambda:x.update(i for i in [])) 1 0 LOAD_GLOBAL 0 (x) 3 LOAD_ATTR 1 (update) 6 LOAD_CONST 0 (<code object at 0xf7e88920, file "", line 1>) 9 MAKE_FUNCTION 0 12 BUILD_LIST 0 15 GET_ITER 16 CALL_FUNCTION 1 19 CALL_FUNCTION 1 22 RETURN_VALUE |
|
|
msg65752 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-04-24 21:24 |
John, when y=[], the update method has to create a new list iterator on each invocation. But when y is a genexp, it is self-iterable (iow, iter (y) will return self, not a new object). Also, when doing timings, it can be helpful to factor-out the attribute lookup: python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' |
|
|
msg65753 - (view) |
Author: John Arbash Meinel (jameinel) |
Date: 2008-04-24 21:39 |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Raymond Hettinger wrote: > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > John, when y=[], the update method has to create a new list iterator on > each invocation. But when y is a genexp, it is self-iterable (iow, iter > (y) will return self, not a new object). > > Also, when doing timings, it can be helpful to factor-out the attribute > lookup: > > python -m timeit -s 'x=set(range(10000)); y=[]; xu=x.update' 'xu(y)' > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2672> > __________________________________ > Sure, I wasn't surprised at the "set.update(y)" versus "set.update([])" What I was surprised at is the time for: "(i for i in [])" being about 4x longer than "set.update(i for i in [])" Anyway, the original issue is probably closed, whether we want to track into the generator stuff or not is probably a different issue. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh cEVLEP5Hu3vpeVgVYdTbAVc= =94ja -----END PGP SIGNATURE----- |
|
|