Optimize core::str::Lines::count by thomcc · Pull Request #123606 · rust-lang/rust (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
rustbot added S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
Relevant to the library team, which will review and decide on the PR/issue.
labels
thomcc added S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
bors added a commit to rust-lang-ci/rust that referenced this pull request
Optimize core::str::Lines::count
s.lines().count()+1
is somewhat common as a way to find the line number given a byte position, so it'd be nice if it were faster.
This just generalizes the SWAR-optimized char counting code so that it can be used for SWAR-optimized line counting, so it's actually not very complex of a PR.
TODO
- benchmarks
- adjust comments
- more tests
Benchmarks
case00_libcore
is the new version, and case01_fold_increment
is the previous implementation (the default impl of Iterator::count()
is a fold that increments
str::line_count::all_newlines_32kib::case00_libcore 4.35µs/iter +/- 11.00ns
str::line_count::all_newlines_32kib::case01_fold_increment 779.99µs/iter +/- 8.43µs
str::line_count::all_newlines_4kib::case00_libcore 562.00ns/iter +/- 5.00ns
str::line_count::all_newlines_4kib::case01_fold_increment 97.81µs/iter +/- 1.48µs
str::line_count::all_newlines_64b::case00_libcore 21.00ns/iter +/- 0.00ns
str::line_count::all_newlines_64b::case01_fold_increment 1.49µs/iter +/- 32.00ns
str::line_count::en_huge::case00_libcore 45.58µs/iter +/- 122.00ns
str::line_count::en_huge::case01_fold_increment 167.62µs/iter +/- 609.00ns
str::line_count::en_large::case00_libcore 734.00ns/iter +/- 6.00ns
str::line_count::en_large::case01_fold_increment 2.62µs/iter +/- 9.00ns
str::line_count::en_medium::case00_libcore 100.00ns/iter +/- 0.00ns
str::line_count::en_medium::case01_fold_increment 347.00ns/iter +/- 0.00ns
str::line_count::en_small::case00_libcore 18.00ns/iter +/- 1.00ns
str::line_count::en_small::case01_fold_increment 60.00ns/iter +/- 2.00ns
str::line_count::en_tiny::case00_libcore 6.00ns/iter +/- 0.00ns
str::line_count::en_tiny::case01_fold_increment 60.00ns/iter +/- 0.00ns
str::line_count::zh_huge::case00_libcore 40.63µs/iter +/- 85.00ns
str::line_count::zh_huge::case01_fold_increment 205.10µs/iter +/- 1.62µs
str::line_count::zh_large::case00_libcore 655.00ns/iter +/- 1.00ns
str::line_count::zh_large::case01_fold_increment 3.21µs/iter +/- 21.00ns
str::line_count::zh_medium::case00_libcore 92.00ns/iter +/- 0.00ns
str::line_count::zh_medium::case01_fold_increment 420.00ns/iter +/- 2.00ns
str::line_count::zh_small::case00_libcore 20.00ns/iter +/- 1.00ns
str::line_count::zh_small::case01_fold_increment 63.00ns/iter +/- 1.00ns
str::line_count::zh_tiny::case00_libcore 6.00ns/iter +/- 0.00ns
str::line_count::zh_tiny::case01_fold_increment 21.00ns/iter +/- 0.00ns
This is a speedup of around 2x-4x most of the time, but for some highly unrealistic scenarios (32KiB of newlines) it's up to almost 200x faster (because the time taken by the version in this PR is not dependent on the number of newlines in the input, but the old version is slower the more newlines are present). It's also much faster for small inputs, especially if they have newlines (10x faster for en_tiny).
Real world cases will vary, don't read too much into these, I would expect 2x-4x speedup in general, since that's what it gets on the most realistic examples.
Obviously a SIMD impl will beat this, but users who are really bottlenecked on this operation should probably just reach for crates.io (even if we provided a SIMD version, libcore can't use runtime CPU feature detection so they'd still be better off with something from crates.io).
1 similar comment