Improve Encoding.UTF8.GetString / GetChars performance for small inputs by GrabYourPitchforks · Pull Request #27268 · dotnet/coreclr (original) (raw)
This improves the performance of Encoding.UTF8.GetString(byte[]) : string
and Encoding.UTF8.GetBytes(string) : byte[]
by building on the existing JIT devirtualization logic and taking advantage of the case that most inputs to these functions are likely to be small (32 elements or fewer). For small inputs such as these, we already know that the maximum input size fits nicely into a stackalloc, so we can avoid the counting step and move straight to transcoding + the final memcpy.
Method | Toolchain | Text | Mean | Error | StdDev | Median | Ratio | RatioSD |
---|---|---|---|---|---|---|---|---|
GetString_FromByteArray | master | 1.098 ns | 0.0619 ns | 0.0549 ns | 1.075 ns | 1.00 | 0.00 | |
GetString_FromByteArray | proto | 8.682 ns | 0.2368 ns | 0.1978 ns | 8.664 ns | 7.91 | 0.48 | |
GetByteArray_FromString | master | 21.109 ns | 0.5399 ns | 1.4032 ns | 20.549 ns | 1.00 | 0.00 | |
GetByteArray_FromString | proto | 8.081 ns | 0.1581 ns | 0.1401 ns | 8.025 ns | 0.36 | 0.03 | |
GetString_FromByteArray | master | Hello | 23.182 ns | 0.3626 ns | 0.3028 ns | 23.191 ns | 1.00 | 0.00 |
GetString_FromByteArray | proto | Hello | 18.949 ns | 0.4584 ns | 1.1997 ns | 18.436 ns | 0.88 | 0.06 |
GetByteArray_FromString | master | Hello | 23.336 ns | 0.5520 ns | 1.0232 ns | 22.987 ns | 1.00 | 0.00 |
GetByteArray_FromString | proto | Hello | 16.103 ns | 0.1553 ns | 0.1377 ns | 16.098 ns | 0.67 | 0.03 |
GetString_FromByteArray | master | Hello world! | 27.745 ns | 0.6358 ns | 0.6803 ns | 27.784 ns | 1.00 | 0.00 |
GetString_FromByteArray | proto | Hello world! | 20.178 ns | 0.2909 ns | 0.2429 ns | 20.168 ns | 0.73 | 0.02 |
GetByteArray_FromString | master | Hello world! | 26.552 ns | 0.6135 ns | 1.4935 ns | 25.959 ns | 1.00 | 0.00 |
GetByteArray_FromString | proto | Hello world! | 16.829 ns | 0.4128 ns | 0.6303 ns | 16.582 ns | 0.63 | 0.04 |
GetString_FromByteArray | master | Lorem(...)elit. [56] | 38.538 ns | 0.3517 ns | 0.3290 ns | 38.418 ns | 1.00 | 0.00 |
GetString_FromByteArray | proto | Lorem(...)elit. [56] | 38.339 ns | 0.4026 ns | 0.3766 ns | 38.403 ns | 0.99 | 0.01 |
GetByteArray_FromString | master | Lorem(...)elit. [56] | 34.777 ns | 0.8004 ns | 1.2695 ns | 34.059 ns | 1.00 | 0.00 |
GetByteArray_FromString | proto | Lorem(...)elit. [56] | 36.436 ns | 0.8080 ns | 1.3937 ns | 35.688 ns | 1.05 | 0.06 |
GetString_FromByteArray | master | Nǐhǎo 你好 | 44.171 ns | 0.8742 ns | 0.7749 ns | 44.235 ns | 1.00 | 0.00 |
GetString_FromByteArray | proto | Nǐhǎo 你好 | 30.719 ns | 0.3818 ns | 0.3188 ns | 30.668 ns | 0.70 | 0.01 |
GetByteArray_FromString | master | Nǐhǎo 你好 | 42.092 ns | 0.6221 ns | 0.4857 ns | 42.092 ns | 1.00 | 0.00 |
GetByteArray_FromString | proto | Nǐhǎo 你好 | 27.424 ns | 0.2978 ns | 0.2487 ns | 27.441 ns | 0.65 | 0.01 |
GetString_FromByteArray | master | Γεια σου κόσμε | 47.637 ns | 0.5525 ns | 0.4614 ns | 47.535 ns | 1.00 | 0.00 |
GetString_FromByteArray | proto | Γεια σου κόσμε | 31.119 ns | 0.4889 ns | 0.4334 ns | 31.074 ns | 0.65 | 0.01 |
GetByteArray_FromString | master | Γεια σου κόσμε | 45.591 ns | 0.4591 ns | 0.4295 ns | 45.518 ns | 1.00 | 0.00 |
GetByteArray_FromString | proto | Γεια σου κόσμε | 30.522 ns | 0.6794 ns | 0.8592 ns | 30.236 ns | 0.67 | 0.02 |
Marked WIP because it's not fully tested and I'm trying to figure out if it would make sense to provide additional overloads beyond these two. In my own experience, the two call sites under consideration here are far and away the most commonly used methods.
These methods are overridden on the internal sealed type rather than the UTF8Encoding
base type because that type is configurable in unexpected ways. For example, somebody may have configured the UTF8Encoding
instance with a custom fallback mechanism, or they may have overridden a virtual method in a manner we're not anticipating. Putting this logic in the internal sealed type instead of the base type works around such potential problems.
It's also possible that we don't want to do this because it represents duplication of logic we'd rather not have. That's understandable - I'm primarily putting this out there to gauge the temperature of the response.