GH-100425: Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum by rhettinger · Pull Request #100860 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation6 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
On the Apple M1 Max, this change makes no difference. I get 303/304 nsec per loop before and after the edit.
Would anyone care to run this on their builds and report back the results?
% ./python.exe -m timeit -r21 -s 'n=100' -s 'from random import expovariate as r' -s 'v1=[r(1000) + r(0.125) for i in range(n)]' 'sum(v1)'
1000000 loops, best of 21: 304 nsec per loop
I see no difference either, on Linux with an AMD Zen 2 chip
Both with and without optimizations I see no difference. System: Linux, gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1), Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
Thank you both. It would be nice to hear from a Windows person as well.
Thank you both. It would be nice to hear from a Windows person as well.
On Windows (default PCbuild/build.bat, no PGO) the timings vary a lot on my system (Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, Windows 10, VS 2019). For this PR, measurements within 5 minutes:
I can confirm that the minimum time for the test is roughly the same for main and this PR.
Thank you. I appreciate it.
@mdickinson Given that 2Sum and Fast2Sum have the same performance in the context of builtin.sum(), do we have a non-performance reason to choose one over the other? Or should I leave the sum()
code as-is?
rhettinger changed the title
Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum GH-100425: Timing experiment: For builtin_sum, try replacing Fast2Sum with 2Sum
@rhettinger Leaving as-is sounds good to me. The two should be functionally identical, so performance is just about the only thing that would justify choosing one over the other.