coreclr (original) (raw)

This improves the performance of Encoding.UTF8.GetString(byte[]) : string and Encoding.UTF8.GetBytes(string) : byte[] by building on the existing JIT devirtualization logic and taking advantage of the case that most inputs to these functions are likely to be small (32 elements or fewer). For small inputs such as these, we already know that the maximum input size fits nicely into a stackalloc, so we can avoid the counting step and move straight to transcoding + the final memcpy.

Method	Toolchain	Text	Mean	Error	StdDev	Median	Ratio	RatioSD
GetString_FromByteArray	master	1.098 ns	0.0619 ns	0.0549 ns	1.075 ns	1.00	0.00
GetString_FromByteArray	proto	8.682 ns	0.2368 ns	0.1978 ns	8.664 ns	7.91	0.48
GetByteArray_FromString	master	21.109 ns	0.5399 ns	1.4032 ns	20.549 ns	1.00	0.00
GetByteArray_FromString	proto	8.081 ns	0.1581 ns	0.1401 ns	8.025 ns	0.36	0.03
GetString_FromByteArray	master	Hello	23.182 ns	0.3626 ns	0.3028 ns	23.191 ns	1.00	0.00
GetString_FromByteArray	proto	Hello	18.949 ns	0.4584 ns	1.1997 ns	18.436 ns	0.88	0.06
GetByteArray_FromString	master	Hello	23.336 ns	0.5520 ns	1.0232 ns	22.987 ns	1.00	0.00
GetByteArray_FromString	proto	Hello	16.103 ns	0.1553 ns	0.1377 ns	16.098 ns	0.67	0.03
GetString_FromByteArray	master	Hello world!	27.745 ns	0.6358 ns	0.6803 ns	27.784 ns	1.00	0.00
GetString_FromByteArray	proto	Hello world!	20.178 ns	0.2909 ns	0.2429 ns	20.168 ns	0.73	0.02
GetByteArray_FromString	master	Hello world!	26.552 ns	0.6135 ns	1.4935 ns	25.959 ns	1.00	0.00
GetByteArray_FromString	proto	Hello world!	16.829 ns	0.4128 ns	0.6303 ns	16.582 ns	0.63	0.04
GetString_FromByteArray	master	Lorem(...)elit. [56]	38.538 ns	0.3517 ns	0.3290 ns	38.418 ns	1.00	0.00
GetString_FromByteArray	proto	Lorem(...)elit. [56]	38.339 ns	0.4026 ns	0.3766 ns	38.403 ns	0.99	0.01
GetByteArray_FromString	master	Lorem(...)elit. [56]	34.777 ns	0.8004 ns	1.2695 ns	34.059 ns	1.00	0.00
GetByteArray_FromString	proto	Lorem(...)elit. [56]	36.436 ns	0.8080 ns	1.3937 ns	35.688 ns	1.05	0.06
GetString_FromByteArray	master	Nǐhǎo 你好	44.171 ns	0.8742 ns	0.7749 ns	44.235 ns	1.00	0.00
GetString_FromByteArray	proto	Nǐhǎo 你好	30.719 ns	0.3818 ns	0.3188 ns	30.668 ns	0.70	0.01
GetByteArray_FromString	master	Nǐhǎo 你好	42.092 ns	0.6221 ns	0.4857 ns	42.092 ns	1.00	0.00
GetByteArray_FromString	proto	Nǐhǎo 你好	27.424 ns	0.2978 ns	0.2487 ns	27.441 ns	0.65	0.01
GetString_FromByteArray	master	Γεια σου κόσμε	47.637 ns	0.5525 ns	0.4614 ns	47.535 ns	1.00	0.00
GetString_FromByteArray	proto	Γεια σου κόσμε	31.119 ns	0.4889 ns	0.4334 ns	31.074 ns	0.65	0.01
GetByteArray_FromString	master	Γεια σου κόσμε	45.591 ns	0.4591 ns	0.4295 ns	45.518 ns	1.00	0.00
GetByteArray_FromString	proto	Γεια σου κόσμε	30.522 ns	0.6794 ns	0.8592 ns	30.236 ns	0.67	0.02

Marked WIP because it's not fully tested and I'm trying to figure out if it would make sense to provide additional overloads beyond these two. In my own experience, the two call sites under consideration here are far and away the most commonly used methods.

These methods are overridden on the internal sealed type rather than the UTF8Encoding base type because that type is configurable in unexpected ways. For example, somebody may have configured the UTF8Encoding instance with a custom fallback mechanism, or they may have overridden a virtual method in a manner we're not anticipating. Putting this logic in the internal sealed type instead of the base type works around such potential problems.

It's also possible that we don't want to do this because it represents duplication of logic we'd rather not have. That's understandable - I'm primarily putting this out there to gauge the temperature of the response.