Issue 29698: _collectionsmodule.c: Replace n++; while (--n) with for (; n; --n) (original) (raw)

Created on 2017-03-02 16:46 by Kojoley, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
_collectionsmodule.s rhettinger,2017-03-02 22:21 GCC-6 Disassembly of reverse() in Modules/_collections.c
Pull Requests
URL Status Linked Edit
PR 396 closed Kojoley,2017-03-02 16:46
Messages (6)
msg288815 - (view) Author: Nikita Kniazev (Kojoley) * Date: 2017-03-02 16:46
I have failed to find previous discussion when `while (n--)` was changed to `n++; while (--n)`. (commits 306d6b1ea6bf2582b9284be2fd27275abbade3e1, 165eee214bc388eb588db33385ca49ddbb305565) It is clear for me that n++; while (--n) and for (; n; --n) are interchangable statements, but here is the prof of it http://coliru.stacked-crooked.com/a/a6fc4108b223e7b2. According to asm out https://godbolt.org/g/heHM33 the `for` loop is even shorter (takes less instructions). While I believe that location of `cmp`/`jmp` instruction makes no sense and performance is the same, but I have made a benchmark. ``` Run on (4 X 3310 MHz CPU s) 02/27/17 22:10:55 Benchmark Time CPU Iterations ------------------------------------------------------------ BM_while_loop/2 13 ns 13 ns 56089384 BM_while_loop/4 17 ns 16 ns 40792279 BM_while_loop/8 24 ns 24 ns 29914338 BM_while_loop/16 40 ns 40 ns 20396140 BM_while_loop/32 84 ns 80 ns 8974301 BM_while_loop/64 146 ns 146 ns 4487151 BM_while_loop/128 270 ns 269 ns 2492862 BM_while_loop/128 267 ns 266 ns 2639500 BM_while_loop/512 1022 ns 1022 ns 641022 BM_while_loop/4096 8203 ns 8344 ns 89743 BM_while_loop/32768 66971 ns 66750 ns 11218 BM_while_loop/262144 545833 ns 546003 ns 1000 BM_while_loop/2097152 4376095 ns 4387528 ns 160 BM_while_loop/8388608 17654654 ns 17883041 ns 41 BM_for_loop/2 13 ns 13 ns 56089384 BM_for_loop/4 15 ns 15 ns 49857230 BM_for_loop/8 21 ns 21 ns 32051077 BM_for_loop/16 37 ns 37 ns 19509351 BM_for_loop/32 81 ns 80 ns 8974301 BM_for_loop/64 144 ns 128 ns 4985723 BM_for_loop/128 265 ns 263 ns 3205108 BM_for_loop/128 265 ns 266 ns 2639500 BM_for_loop/512 1036 ns 1022 ns 641022 BM_for_loop/4096 8314 ns 8344 ns 89743 BM_for_loop/32768 67345 ns 66750 ns 11218 BM_for_loop/262144 541310 ns 546004 ns 1000 BM_for_loop/2097152 4354986 ns 4387528 ns 160 BM_for_loop/8388608 17592428 ns 17122061 ns 41 ``` ```cpp #include <benchmark/benchmark.h> #define MAKE_ROTL_BENCHMARK(name) \ static void BM_##name(benchmark::State& state) { \ while (state.KeepRunning()) { \ int n = name(state.range(0)); \ } \ } \ /**/ int while_loop(int n) { int sum = 0; n++; while (--n) { sum += 1; } return sum; } int for_loop(int n) { int sum = 0; for(; n; --n) { sum += 1; } return sum; } MAKE_ROTL_BENCHMARK(while_loop) MAKE_ROTL_BENCHMARK(for_loop) BENCHMARK(BM_while_loop)->RangeMultiplier(2)->Range(2, 8<<4); BENCHMARK(BM_while_loop)->Range(8<<4, 8<<20); BENCHMARK(BM_for_loop)->RangeMultiplier(2)->Range(2, 8<<4); BENCHMARK(BM_for_loop)->Range(8<<4, 8<<20); BENCHMARK_MAIN() ```
msg288816 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:12
The purpose of the issue is unclear to me. Why do you want to replace while with for? For readability? I agree that "n++; while (--n) ..." is surprising. It seems strange to start with a "n++" to decrement it in the next instruction :-) It's also unusual to write loops like that. > I have failed to find previous discussion when `while (n--)` was changed to `n++; while (--n)`. You should ask Raymond Hettinger for the rationale. The commit message says "better generated code (on both GCC and CLang)", but there is no benchmark nor any data to validate this. > I have made a benchmark. Nowadays, C compilers are very smart and implement crazy optimizations. Python releases are compiled using PGO, or even PGO+LTO. Please compile your benchmark using PGO. I expect that compilers emit the same machine code at the end for "while" and "for" loops. The question is also if your benchmark is revelant for the _collections module. What should I see in your benchmark? I see that results are the same for while and for_loop except a minor noise in the benchmark. -- I also dislike spending time on such minor micro-optimization...
msg288817 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:15
Oh, a little bit more context: Raymond Hettinger is the maintainer of _collectionsmodule.c and spent a lot of time to optimize this file!
msg288818 - (view) Author: Nikita Kniazev (Kojoley) * Date: 2017-03-02 17:28
> The purpose of the issue is unclear to me. I was asked to open an issue by Serhiy Storchaka on the GitHub PR. > Why do you want to replace while with for? For readability? Yes, I have open a PR just to improve the readability, because I was surprised by this incrementing-decrementing statements like you. > You should ask Raymond Hettinger for the rationale. The commit message says "better generated code (on both GCC and CLang)", but there is no benchmark nor any data to validate this. The purpose is clear to me - to eliminate postincrement and temporary variable that it requires. > Nowadays, C compilers are very smart and implement crazy optimizations. Python releases are compiled using PGO, or even PGO+LTO. Please compile your benchmark using PGO. I expect that compilers emit the same machine code at the end for "while" and "for" loops. > The question is also if your benchmark is revelant for the _collections module. > What should I see in your benchmark? I see that results are the same for while and for_loop except a minor noise in the benchmark. Benchmark was made to show that location of cmp+jmp location (look at asm output for more info) makes no sense to performance. Actually I do not want to spend more time on a such minor change.
msg288819 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-02 17:31
> Yes, I have open a PR just to improve the readability, because I was surprised by this incrementing-decrementing statements like you. Honestly, I don't really care of the exact syntax of C loops in _collectionmodule.c. But if your patch is rejected (right now, I'm neutral), it would be nice to add a comment with a reference to this issue on the loops to avoid other questions in the future.
msg288838 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-02 22:21
Sorry, but I'm going to keep the code as-is. To mine eyes (the maintainer), the while-loop better expresses my intention (a decrement-skip-on-zero step). On at least one compiler (GCC-6), it allows better code to generated (a sub-and-test instead of a sub-compare-and-test).
History
Date User Action Args
2022-04-11 14:58:43 admin set github: 73884
2017-03-02 22:21:41 rhettinger set status: open -> closedfiles: + _collectionsmodule.smessages: + resolution: not a bugstage: patch review -> resolved
2017-03-02 17:31:53 vstinner set messages: +
2017-03-02 17:28:17 Kojoley set messages: +
2017-03-02 17:15:26 vstinner set messages: +
2017-03-02 17:12:08 vstinner set nosy: + vstinnermessages: +
2017-03-02 17:00:12 serhiy.storchaka set versions: - Python 3.6nosy: + rhettinger, serhiy.storchakaassignee: rhettingercomponents: + Extension Modules, - Interpreter Corestage: patch review
2017-03-02 16:46:33 Kojoley create