Added fast path to BinaryWriter.Write(string) by bbartels · Pull Request #37705 · dotnet/runtime (original) (raw)

This PR improves the code path for BinaryWriter.Write(string) especially in scenarios where the encoding specified is variable in length. The previous implementation made a blanket assumption that each character within the input string is always the maximum size of any given character in the output encoding; leading to inefficient packing of the byte buffer.

Due to this, the previous implementation performs poorly when writing characters that take up less space than the maximum size of a character in an encoding i.e. strings constrained to ASCII with a UTF-8 encoder.

If however an encoding is used, in which the input characters end up filling the byte buffer, the new implementation is quite a bit slower. This is why in such cases this PR delegates to the old implementation to not introduce any performance regressions.

The only thing left is to decide in which specific cases the previous path should be taken. As you can see on the benchmarks (below) the only instances in which regressions appear seem to be pertaining to ASCIIEncoding. A simple predicate could simply whitelist the encodings which do not experience any performance loss, but then the optimisation will not carry over to any custom encodings. Maybe someone has a better idea!

Also fixes: #37158

I ran numerous benchmarks in different scenarios, so the full table of benchmarks can be found here: https://gist.github.com/bbartels/7cc70209a64e01d4fa663546fe035e0a

The Benchmarks were run with a MemoryStream backing the BinaryWriter.
The Benchmarks were also run without the old implementation in place for ASCIIEncoding. If the current PR would be benched, the ASCIIEncoding benchmarks would have equivalent performance.

CharSize = Average char size of output encoding
Size = The Number of input chars

Below are a couple notable excerpts

Improvements:

Method Impl CharSize Size Encoding Mean Ratio
Write Old 1 Byte 500 UTF-8 545.4 ns 1.00
Write New 1 Byte 500 UTF-8 350.5 ns 0.64
Write Old 1 Byte 10000 UTF-8 7,022.4 ns 1.00
Write New 1 Byte 10000 UTF-8 3,706.2 ns 0.53
Write Old 1.1 Byte 500 UTF-8 754.0 ns 1.00
Write New 1.1 Byte 500 UTF-8 628.5 ns 0.83
Write Old 1.1 Byte 10000 UTF-8 10,878.3 ns 1.00
Write New 1.1 Byte 10000 UTF-8 8,280.9 ns 0.76
Write Old 4 Byte 1000000 UTF-16 4,926.7 ns 1.00
Write New 4 Byte 1000000 UTF-16 4,005.1 ns 0.81

Regressions:

Method Impl CharSize Size Encoding Mean Ratio
Write Old 1 Byte 10000 ASCII 2,557.0 ns 1.00
Write New 1 Byte 10000 ASCII 3,178.9 ns 1.24
Write Old 1 Byte 1000000 ASCII 240,853.0 ns 1.00
Write New 1 Byte 1000000 ASCII 299,498.3 ns 1.24