[Python-ideas] while conditional in list comprehension ?? (original) (raw)

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Wed Jan 30 10:46:31 CET 2013


Yuriy Taraday <yorik.sar at ...> writes:

On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote: list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!

I couldn't believe it so I had to check it: from future import printfunction import functools, itertools, operator, timeit def var1(): def gen(): for i in range(100): if i > 50: break yield i return list(gen()) def var2(): def stop(): raise StopIteration return list(i for i in range(100) if i <= 50 or stop()) def var3(): return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] def var4(): return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] if name == 'main': for f in (var1, var2, var3, var4): print(f.name, end=' ') print(timeit.timeit(f)) Results on my machine: var1 20.4974410534 var2 23.6218020916 var3 32.1543409824 var4 4.90913701057 var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? var3 is the slowest. I guess, because it has lambda in it. But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. -- Kind regards, Yuriy.

Steven D'Aprano <steve at ...> writes:

On 30/01/13 02:44, Wolfgang Maier wrote: > list(i for i in range(100) if i<50 or stop())_ _> Really (!) nice (and 2x as fast as using itertools.takewhile())! I think you are mistaken about the speed. The itertools iterators are highly optimized and do all their work in fast C code. If you are seeing takewhile as slow, you are probably doing something wrong: untrustworthy timing code, misinterpreting what you are seeing, or some other error. Here's a comparison done the naive or obvious way. Copy and paste it into an interactive Python session: from itertools import takewhile from timeit import Timer def stop(): raise StopIteration setup = 'from main import stop, takewhile' t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup)_ _t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup)_ _min(t1.repeat(number=100000, repeat=5))_ _min(t2.repeat(number=100000, repeat=5))_ _On my computer, t1 is about 1.5 times faster than t2. But this is misleading,_ _because it's not takewhile that is slow. I am feeding something slow into_ _takewhile. If I really need to run as fast as possible, I can optimize the_ _function call inside takewhile:_ _from operator import lt_ _from functools import partial_ _smallenough = partial(lt, 50)_ _setup2 = 'from _main_ import takewhile, smallenough'_ _t3 = Timer('[i for i in takewhile(smallenough, xrange(1000))]', setup2)_ _min(t3.repeat(number=100000, repeat=5))_ _On my computer, t3 is nearly 13 times faster than t1, and 19 times faster_ _than t2. Here are the actual times I get, using Python 2.7:_ _py> min(t1.repeat(number=100000, repeat=5)) # using the StopIteration hack 1.2609241008758545 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.85182785987854 py> min(t3.repeat(number=100000, repeat=5)) # optimized version 0.09847092628479004

Hi Yuriy and Steven, a) I had compared the originally proposed 'takewhile with lambda' version to the 'if cond or stop()' solution using 'timeit' just like you did. In principle, you find the same as I did, although I am a bit surprised that our differences are different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than 'takewhile with lambda'.

b) I have to say I was very impressed by the speed gains you report through the use of 'partial', which I had not thought of at all, I have to admit. However, I tested your suggestions and I think they both suffer from the same mistake: your condition is 'partial(lt,50)', but this is not met to begin with and results in an empty list at least for me. Have you two actually checked the output of the code or have you just timed it? I found that in order to make it work the comparison has to be made via 'partial(gt,50)'. With this modification the resulting list in your example would be [0,..,49] as it should be.

And now the big surprise in terms of runtimes: partial(lt,50) variant: 1.17 (but incorrect results) partial(gt,50) variant: 13.95 if cond or stop() variant: 9.86

I guess python is just smart enough to recognize that it compares against a constant value all the time, and optimizes the code accordingly (after all the if clause is a pretty standard thing to use in a comprehension).

So the reason for your reported speed-gain is that you actually broke out of the comprehension at the very first element instead of going through the first 50! Please comment, if you get different results. Best, Wolfgang



More information about the Python-ideas mailing list