Optimize core loop of _Count_vbool by localspook · Pull Request #5640 · microsoft/STL (original) (raw)

I mean intuitively it looks like an improvement, but without benchmarking I wouldn't be sure that the compiler doesn't do something clever on its own.

This PR optimization is not what compiler is likely to do, but it might be able to achieve the same by duplication the loop for *_VbFirst and ~*_VbFirst and making the condition out of loop.

I don't think it is very likely to happen, and that's why we implemented the _Select_popcount_impl thing, but would be good to prove with measurement.