Implement P3107R5 optimized <print>
by blackninja9939 路 Pull Request #4821 路 microsoft/STL (original) (raw)
Fixes #4509
Is my first PR so I expect plenty of fun times and issues 馃槄
In my benchmarking the non-buffered version is about 400-700ns faster, which is not as much as I would have hoped, but I do not think there is a huge amount of room in the design of format and Window's write API for great gains.
The format API ties the iterator type to everything, so we always must go through _Fmt_iterator_buffer writing char's into the buffer and flushing on demand, so using _fputc_nolock is hard since we never actually do anything character by character for the wrapped iterator and the type is important.
Instead I've generalised the optimization vector and string were doing of a customization point for the wrapped iterator when we flush and used that in these custom iterators to write to the stream or to the console with the parent holding a lock. This avoids need to ever allocate the final std::string or re-take the lock.
If we write to the console we must additionally transcode to wchar_t so we can call WriteConsoleW, I've reduced allocation there as well since it can often fit in a buffer but it is still work being done.
Its still a speedup, and I'm sure there is extra room for gains, but I tried various approaches and this was the "simplest" and reduced the most allocation calls.