Optimize insertion sort · Pull Request #40807 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation8 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff %
slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56%
slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69%
slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50%
slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
The benchmark is not a fluke. I can see that performance on small_descending
is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that v.len()+1
to the compiler looks like it could in theory overflow.
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff % slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56% slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69% slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50% slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
r? @aturon
(rust_highfive has picked a reviewer for you, use r? to override)
📌 Commit 2c816f7 has been approved by alexcrichton
alexcrichton added a commit to alexcrichton/rust that referenced this pull request
…=alexcrichton
Optimize insertion sort
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff %
slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56%
slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69%
slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50%
slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
The benchmark is not a fluke. I can see that performance on small_descending
is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that v.len()+1
to the compiler looks like it could in theory overflow.
bors added a commit that referenced this pull request
Rollup of 11 pull requests
- Successful merges: #40347, #40501, #40516, #40524, #40540, #40642, #40683, #40764, #40778, #40807, #40809
- Failed merges: #40771
bors added a commit that referenced this pull request
Rollup of 11 pull requests
- Successful merges: #40347, #40501, #40516, #40524, #40540, #40642, #40683, #40764, #40778, #40807, #40809
- Failed merges: #40771
Since this is internal to libstd/core, could you check whether the inclusive range syntax makes things better as well?
i.e. 2 ... v.len()
@nagisa Inclusive range syntax makes performance slightly worse, actually...
With the old insertion sort and with inclusive range syntax there's a bound check at the beginning of insertion sort. This PR removes the bound check.
If you want to play with this, here's a playpen link.
⌛ Testing commit 2c816f7 with merge 04e47d7...
frewsxcv added a commit to frewsxcv/rust that referenced this pull request
…=alexcrichton
Optimize insertion sort
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff %
slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56%
slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69%
slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50%
slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
The benchmark is not a fluke. I can see that performance on small_descending
is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that v.len()+1
to the compiler looks like it could in theory overflow.
frewsxcv added a commit to frewsxcv/rust that referenced this pull request
…=alexcrichton
Optimize insertion sort
This change slightly changes the main iteration loop so that LLVM can optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff %
slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56%
slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69%
slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50%
slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
The benchmark is not a fluke. I can see that performance on small_descending
is consistently better after this change. I'm not 100% sure why this makes things faster, but my guess would be that v.len()+1
to the compiler looks like it could in theory overflow.
bors added a commit that referenced this pull request
ghost deleted the optimize-insertion-sort branch