Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding by GrabYourPitchforks · Pull Request #21948 · dotnet/coreclr (original) (raw)
Quick benchmarks, using various corpus texts from Project Gutenberg.
Method | Toolchain | Corpus | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|---|---|
GetByteCount | 3.0-master | 11-0.txt | 7,087.7 us | 137.611 us | 121.989 us | 1.00 | 0.00 |
GetByteCount | utf8_1 | 11-0.txt | 2,527.0 us | 23.333 us | 20.684 us | 0.36 | 0.01 |
GetBytes | 3.0-master | 11-0.txt | 15,774.5 us | 314.180 us | 322.640 us | 1.00 | 0.00 |
GetBytes | utf8_1 | 11-0.txt | 10,600.3 us | 59.802 us | 53.013 us | 0.67 | 0.01 |
GetCharCount | 3.0-master | 11-0.txt | 6,257.4 us | 46.583 us | 43.574 us | 1.00 | 0.00 |
GetCharCount | utf8_1 | 11-0.txt | 5,116.3 us | 30.386 us | 26.936 us | 0.82 | 0.01 |
GetChars | 3.0-master | 11-0.txt | 16,091.1 us | 61.085 us | 54.150 us | 1.00 | 0.00 |
GetChars | utf8_1 | 11-0.txt | 12,679.0 us | 120.497 us | 106.817 us | 0.79 | 0.01 |
GetByteCount | 3.0-master | 11.txt | 2,192.5 us | 17.319 us | 13.522 us | 1.00 | 0.00 |
GetByteCount | utf8_1 | 11.txt | 955.8 us | 7.903 us | 6.599 us | 0.44 | 0.00 |
GetBytes | 3.0-master | 11.txt | 7,759.5 us | 59.931 us | 53.128 us | 1.00 | 0.00 |
GetBytes | utf8_1 | 11.txt | 2,303.0 us | 19.080 us | 17.847 us | 0.30 | 0.00 |
GetCharCount | 3.0-master | 11.txt | 1,093.5 us | 8.459 us | 7.063 us | 1.00 | 0.00 |
GetCharCount | utf8_1 | 11.txt | 325.3 us | 2.738 us | 2.561 us | 0.30 | 0.00 |
GetChars | 3.0-master | 11.txt | 6,521.0 us | 128.918 us | 176.465 us | 1.00 | 0.00 |
GetChars | utf8_1 | 11.txt | 1,549.4 us | 11.943 us | 10.587 us | 0.24 | 0.01 |
GetByteCount | 3.0-master | 25249-0.txt | 9,129.8 us | 23.333 us | 19.484 us | 1.00 | 0.00 |
GetByteCount | utf8_1 | 25249-0.txt | 1,166.1 us | 6.068 us | 5.067 us | 0.13 | 0.00 |
GetBytes | 3.0-master | 25249-0.txt | 21,678.0 us | 81.633 us | 76.360 us | 1.00 | 0.00 |
GetBytes | utf8_1 | 25249-0.txt | 10,845.7 us | 54.393 us | 48.218 us | 0.50 | 0.00 |
GetCharCount | 3.0-master | 25249-0.txt | 13,441.7 us | 84.330 us | 74.756 us | 1.00 | 0.00 |
GetCharCount | utf8_1 | 25249-0.txt | 7,063.4 us | 71.013 us | 66.426 us | 0.53 | 0.01 |
GetChars | 3.0-master | 25249-0.txt | 29,216.9 us | 122.028 us | 101.899 us | 1.00 | 0.00 |
GetChars | utf8_1 | 25249-0.txt | 16,593.1 us | 115.657 us | 96.579 us | 0.57 | 0.00 |
GetByteCount | 3.0-master | 30774-0.txt | 6,624.6 us | 40.983 us | 36.330 us | 1.00 | 0.00 |
GetByteCount | utf8_1 | 30774-0.txt | 1,011.6 us | 18.997 us | 17.770 us | 0.15 | 0.00 |
GetBytes | 3.0-master | 30774-0.txt | 19,557.6 us | 64.271 us | 56.975 us | 1.00 | 0.00 |
GetBytes | utf8_1 | 30774-0.txt | 10,282.8 us | 104.523 us | 92.657 us | 0.53 | 0.01 |
GetCharCount | 3.0-master | 30774-0.txt | 12,847.1 us | 181.605 us | 169.873 us | 1.00 | 0.00 |
GetCharCount | utf8_1 | 30774-0.txt | 7,234.3 us | 23.763 us | 18.553 us | 0.57 | 0.01 |
GetChars | 3.0-master | 30774-0.txt | 26,399.1 us | 119.822 us | 100.056 us | 1.00 | 0.00 |
GetChars | utf8_1 | 30774-0.txt | 17,474.5 us | 434.245 us | 564.641 us | 0.67 | 0.03 |
GetByteCount | 3.0-master | 39251-0.txt | 12,026.9 us | 112.657 us | 99.867 us | 1.00 | 0.00 |
GetByteCount | utf8_1 | 39251-0.txt | 1,295.9 us | 11.245 us | 10.519 us | 0.11 | 0.00 |
GetBytes | 3.0-master | 39251-0.txt | 31,157.5 us | 273.212 us | 255.563 us | 1.00 | 0.00 |
GetBytes | utf8_1 | 39251-0.txt | 16,027.7 us | 90.350 us | 75.446 us | 0.51 | 0.01 |
GetCharCount | 3.0-master | 39251-0.txt | 20,423.3 us | 246.767 us | 230.826 us | 1.00 | 0.00 |
GetCharCount | utf8_1 | 39251-0.txt | 13,669.7 us | 168.254 us | 157.385 us | 0.67 | 0.01 |
GetChars | 3.0-master | 39251-0.txt | 40,840.9 us | 399.104 us | 373.322 us | 1.00 | 0.00 |
GetChars | utf8_1 | 39251-0.txt | 28,058.7 us | 228.340 us | 213.590 us | 0.69 | 0.01 |
These texts are large (~100KB). I'll work on getting benchmarks for smaller texts as well.