GH-100425: Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum by rhettinger · Pull Request #100860 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation6 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

rhettinger

On the Apple M1 Max, this change makes no difference. I get 303/304 nsec per loop before and after the edit.

Would anyone care to run this on their builds and report back the results?

% ./python.exe -m timeit -r21 -s 'n=100' -s 'from random import expovariate as r' -s 'v1=[r(1000) + r(0.125) for i in range(n)]'   'sum(v1)'
1000000 loops, best of 21: 304 nsec per loop

@rhettinger

@hauntsaninja

I see no difference either, on Linux with an AMD Zen 2 chip

@eendebakpt

Both with and without optimizations I see no difference. System: Linux, gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1), Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz

@rhettinger

Thank you both. It would be nice to hear from a Windows person as well.

@eendebakpt

Thank you both. It would be nice to hear from a Windows person as well.

On Windows (default PCbuild/build.bat, no PGO) the timings vary a lot on my system (Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, Windows 10, VS 2019). For this PR, measurements within 5 minutes:

image

I can confirm that the minimum time for the test is roughly the same for main and this PR.

@rhettinger

Thank you. I appreciate it.

@mdickinson Given that 2Sum and Fast2Sum have the same performance in the context of builtin.sum(), do we have a non-performance reason to choose one over the other? Or should I leave the sum() code as-is?

@rhettinger rhettinger changed the titleTiming experiment: For builtin_sum, try replacing Fast2Sum with 2Sum GH-100425: Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum

Jan 12, 2023

@mdickinson

@rhettinger Leaving as-is sounds good to me. The two should be functionally identical, so performance is just about the only thing that would justify choosing one over the other.