Optimize ToString
implementation for integers by GuillaumeGomez · Pull Request #136264 · rust-lang/rust (original) (raw)
What benchmarks are we using to evaluate "make it faster"? I don't see results in this PR (and e.g. the description explicitly calls out not being sure how to check).
I posted a comparison with and without these changes in this comment. A lot less of assembly code is generated, however, just this difference is not enough to know the impact on performance. I wrote benchmarks but this is not my specialty and all I could get was a 1-2% performance difference (which is already significant, but did I write the benches correctly? That's another question).
Does the formatter flag checking not get optimized out after inlining? If not, maybe we can improve on that, for example by dispatching early on defaults or similar?
No and unless you add a new field to Formatter
allowing you to know that no field was updated and that the checks should be skipped, I don't see how you could get this optimization.
I'm not personally convinced that having N different implementations (can you confirm all of the cases are at least equally covered by tests?) is worth what I'd expect to be marginal improvements (this is where concrete numbers would be useful to justify this), especially when I'd expect that in most cases if you want the fastest possible serialization, ToString is not what you want -- it's pretty unlikely that an owned String containing just the integer is all you need, and the moment you want more than that you're going to be reaching for e.g. itoa to avoid heap allocations etc.
I'm not convinced either but since the optimize_for_size
feature flag exists, I need to deal with it. As for test coverage, no idea for optimize_for_size
but the conversions are tested in the "normal" case. But in any case, this code doesn't change the behaviour of optimize_for_size
so on this side we're good.
Also, I'm not looking for the fastest implementation, I'm looking at improving the current situation which is really suboptimal. We could add a new write_into<W: Write>(self, &mut W)
method to have something as fast as itoa
, but that's a whole other discussion and I don't plan to start it. My plan ends with this PR. Also to be noted: with this PR, the only remaining difference with itoa
is that we don't allow to write an integer into a String, everything else is the exact same.
Anyway, the PR is ready, it has a visible impact on at least the generated assembly which is notably smaller by allowing to skip all Formatter
code. It doesn't change the behaviour of optimize_for_size
and adds a very small amount of code. Having this kind of small optimization in places like integers to string optimization is always very welcome.