use new System.Text.Ascii APIs, remove internal helpers by adamsitnik · Pull Request #48368 · dotnet/aspnetcore (original) (raw)
I took a closer look at BytesOrdinalEqualsStringAndAscii
.
If the remainder of the vectorized loop contains zero (or the input is simply too small to go the vectorized code path), but the inputs are equal it can still return true
:
if (offset < count) |
---|
{ |
var ch = (char)Unsafe.Add(ref bytes, offset); |
if (((ch & 0x80) != 0) | |
{ |
goto NotEqual; |
} |
} |
// End of input reached, there are no inequalities via widening; so the input bytes are both ascii |
// and a match to the string if it was converted via Encoding.ASCII.GetString(...) |
return true; |
But the most important thing is that it checks for zero only one of the inputs:
var vector = Unsafe.ReadUnaligned<Vector<sbyte>>(ref Unsafe.Add(ref bytes, offset)); |
---|
if (!CheckBytesInAsciiRange(vector)) |
And from what I can see the other input is always a const (known header for example):
$@"// Matched a known header |
---|
It seems that only one of the inputs may contain null characters (because the other one is typically a pre-defined const) and hence Ascii.Equals
will always return false
in such cases because the inputs will simply not be equal?
@BrennanConroy is my understanding correct?
Edit: I just realized that the existing helper method ensures that the string
input does not contain zeros:
private static bool IsValidHeaderString(string value) |
---|
{ |
// Method for Debug.Assert to ensure BytesOrdinalEqualsStringAndAscii |
// is not called with an unvalidated string comparitor. |
try |
{ |
if (value is null) |
{ |
return false; |
} |
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true).GetByteCount(value); |
return !value.Contains('\0'); |
So it should be safe to switch to Ascii.Equals
, but to be extra safe I can keep the old helper method just to verify the input and delegate to Ascii.Equals
:
public static bool BytesOrdinalEqualsStringAndAscii(string previousValue, ReadOnlySpan newValue) { // previousValue is a previously materialized string which must have already passed validation. Debug.Assert(IsValidHeaderString(previousValue));
return Ascii.Equals(previousValue, newValue);
}