Improve Encoding.UTF8.GetString / GetChars performance for small inputs by GrabYourPitchforks · Pull Request #27268 · dotnet/coreclr (original) (raw)

This improves the performance of Encoding.UTF8.GetString(byte[]) : string and Encoding.UTF8.GetBytes(string) : byte[] by building on the existing JIT devirtualization logic and taking advantage of the case that most inputs to these functions are likely to be small (32 elements or fewer). For small inputs such as these, we already know that the maximum input size fits nicely into a stackalloc, so we can avoid the counting step and move straight to transcoding + the final memcpy.

Method Toolchain Text Mean Error StdDev Median Ratio RatioSD
GetString_FromByteArray master 1.098 ns 0.0619 ns 0.0549 ns 1.075 ns 1.00 0.00
GetString_FromByteArray proto 8.682 ns 0.2368 ns 0.1978 ns 8.664 ns 7.91 0.48
GetByteArray_FromString master 21.109 ns 0.5399 ns 1.4032 ns 20.549 ns 1.00 0.00
GetByteArray_FromString proto 8.081 ns 0.1581 ns 0.1401 ns 8.025 ns 0.36 0.03
GetString_FromByteArray master Hello 23.182 ns 0.3626 ns 0.3028 ns 23.191 ns 1.00 0.00
GetString_FromByteArray proto Hello 18.949 ns 0.4584 ns 1.1997 ns 18.436 ns 0.88 0.06
GetByteArray_FromString master Hello 23.336 ns 0.5520 ns 1.0232 ns 22.987 ns 1.00 0.00
GetByteArray_FromString proto Hello 16.103 ns 0.1553 ns 0.1377 ns 16.098 ns 0.67 0.03
GetString_FromByteArray master Hello world! 27.745 ns 0.6358 ns 0.6803 ns 27.784 ns 1.00 0.00
GetString_FromByteArray proto Hello world! 20.178 ns 0.2909 ns 0.2429 ns 20.168 ns 0.73 0.02
GetByteArray_FromString master Hello world! 26.552 ns 0.6135 ns 1.4935 ns 25.959 ns 1.00 0.00
GetByteArray_FromString proto Hello world! 16.829 ns 0.4128 ns 0.6303 ns 16.582 ns 0.63 0.04
GetString_FromByteArray master Lorem(...)elit. [56] 38.538 ns 0.3517 ns 0.3290 ns 38.418 ns 1.00 0.00
GetString_FromByteArray proto Lorem(...)elit. [56] 38.339 ns 0.4026 ns 0.3766 ns 38.403 ns 0.99 0.01
GetByteArray_FromString master Lorem(...)elit. [56] 34.777 ns 0.8004 ns 1.2695 ns 34.059 ns 1.00 0.00
GetByteArray_FromString proto Lorem(...)elit. [56] 36.436 ns 0.8080 ns 1.3937 ns 35.688 ns 1.05 0.06
GetString_FromByteArray master Nǐhǎo 你好 44.171 ns 0.8742 ns 0.7749 ns 44.235 ns 1.00 0.00
GetString_FromByteArray proto Nǐhǎo 你好 30.719 ns 0.3818 ns 0.3188 ns 30.668 ns 0.70 0.01
GetByteArray_FromString master Nǐhǎo 你好 42.092 ns 0.6221 ns 0.4857 ns 42.092 ns 1.00 0.00
GetByteArray_FromString proto Nǐhǎo 你好 27.424 ns 0.2978 ns 0.2487 ns 27.441 ns 0.65 0.01
GetString_FromByteArray master Γεια σου κόσμε 47.637 ns 0.5525 ns 0.4614 ns 47.535 ns 1.00 0.00
GetString_FromByteArray proto Γεια σου κόσμε 31.119 ns 0.4889 ns 0.4334 ns 31.074 ns 0.65 0.01
GetByteArray_FromString master Γεια σου κόσμε 45.591 ns 0.4591 ns 0.4295 ns 45.518 ns 1.00 0.00
GetByteArray_FromString proto Γεια σου κόσμε 30.522 ns 0.6794 ns 0.8592 ns 30.236 ns 0.67 0.02

Marked WIP because it's not fully tested and I'm trying to figure out if it would make sense to provide additional overloads beyond these two. In my own experience, the two call sites under consideration here are far and away the most commonly used methods.

These methods are overridden on the internal sealed type rather than the UTF8Encoding base type because that type is configurable in unexpected ways. For example, somebody may have configured the UTF8Encoding instance with a custom fallback mechanism, or they may have overridden a virtual method in a manner we're not anticipating. Putting this logic in the internal sealed type instead of the base type works around such potential problems.

It's also possible that we don't want to do this because it represents duplication of logic we'd rather not have. That's understandable - I'm primarily putting this out there to gauge the temperature of the response.