Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding by GrabYourPitchforks · Pull Request #21948 · dotnet/coreclr (original) (raw)

Quick benchmarks, using various corpus texts from Project Gutenberg.

Method Toolchain Corpus Mean Error StdDev Ratio RatioSD
GetByteCount 3.0-master 11-0.txt 7,087.7 us 137.611 us 121.989 us 1.00 0.00
GetByteCount utf8_1 11-0.txt 2,527.0 us 23.333 us 20.684 us 0.36 0.01
GetBytes 3.0-master 11-0.txt 15,774.5 us 314.180 us 322.640 us 1.00 0.00
GetBytes utf8_1 11-0.txt 10,600.3 us 59.802 us 53.013 us 0.67 0.01
GetCharCount 3.0-master 11-0.txt 6,257.4 us 46.583 us 43.574 us 1.00 0.00
GetCharCount utf8_1 11-0.txt 5,116.3 us 30.386 us 26.936 us 0.82 0.01
GetChars 3.0-master 11-0.txt 16,091.1 us 61.085 us 54.150 us 1.00 0.00
GetChars utf8_1 11-0.txt 12,679.0 us 120.497 us 106.817 us 0.79 0.01
GetByteCount 3.0-master 11.txt 2,192.5 us 17.319 us 13.522 us 1.00 0.00
GetByteCount utf8_1 11.txt 955.8 us 7.903 us 6.599 us 0.44 0.00
GetBytes 3.0-master 11.txt 7,759.5 us 59.931 us 53.128 us 1.00 0.00
GetBytes utf8_1 11.txt 2,303.0 us 19.080 us 17.847 us 0.30 0.00
GetCharCount 3.0-master 11.txt 1,093.5 us 8.459 us 7.063 us 1.00 0.00
GetCharCount utf8_1 11.txt 325.3 us 2.738 us 2.561 us 0.30 0.00
GetChars 3.0-master 11.txt 6,521.0 us 128.918 us 176.465 us 1.00 0.00
GetChars utf8_1 11.txt 1,549.4 us 11.943 us 10.587 us 0.24 0.01
GetByteCount 3.0-master 25249-0.txt 9,129.8 us 23.333 us 19.484 us 1.00 0.00
GetByteCount utf8_1 25249-0.txt 1,166.1 us 6.068 us 5.067 us 0.13 0.00
GetBytes 3.0-master 25249-0.txt 21,678.0 us 81.633 us 76.360 us 1.00 0.00
GetBytes utf8_1 25249-0.txt 10,845.7 us 54.393 us 48.218 us 0.50 0.00
GetCharCount 3.0-master 25249-0.txt 13,441.7 us 84.330 us 74.756 us 1.00 0.00
GetCharCount utf8_1 25249-0.txt 7,063.4 us 71.013 us 66.426 us 0.53 0.01
GetChars 3.0-master 25249-0.txt 29,216.9 us 122.028 us 101.899 us 1.00 0.00
GetChars utf8_1 25249-0.txt 16,593.1 us 115.657 us 96.579 us 0.57 0.00
GetByteCount 3.0-master 30774-0.txt 6,624.6 us 40.983 us 36.330 us 1.00 0.00
GetByteCount utf8_1 30774-0.txt 1,011.6 us 18.997 us 17.770 us 0.15 0.00
GetBytes 3.0-master 30774-0.txt 19,557.6 us 64.271 us 56.975 us 1.00 0.00
GetBytes utf8_1 30774-0.txt 10,282.8 us 104.523 us 92.657 us 0.53 0.01
GetCharCount 3.0-master 30774-0.txt 12,847.1 us 181.605 us 169.873 us 1.00 0.00
GetCharCount utf8_1 30774-0.txt 7,234.3 us 23.763 us 18.553 us 0.57 0.01
GetChars 3.0-master 30774-0.txt 26,399.1 us 119.822 us 100.056 us 1.00 0.00
GetChars utf8_1 30774-0.txt 17,474.5 us 434.245 us 564.641 us 0.67 0.03
GetByteCount 3.0-master 39251-0.txt 12,026.9 us 112.657 us 99.867 us 1.00 0.00
GetByteCount utf8_1 39251-0.txt 1,295.9 us 11.245 us 10.519 us 0.11 0.00
GetBytes 3.0-master 39251-0.txt 31,157.5 us 273.212 us 255.563 us 1.00 0.00
GetBytes utf8_1 39251-0.txt 16,027.7 us 90.350 us 75.446 us 0.51 0.01
GetCharCount 3.0-master 39251-0.txt 20,423.3 us 246.767 us 230.826 us 1.00 0.00
GetCharCount utf8_1 39251-0.txt 13,669.7 us 168.254 us 157.385 us 0.67 0.01
GetChars 3.0-master 39251-0.txt 40,840.9 us 399.104 us 373.322 us 1.00 0.00
GetChars utf8_1 39251-0.txt 28,058.7 us 228.340 us 213.590 us 0.69 0.01

These texts are large (~100KB). I'll work on getting benchmarks for smaller texts as well.