Optimize CheckIriUnicodeRange by MihaZupan · Pull Request #31860 · dotnet/runtime (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation17 Commits6 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Avoid using an intermediate string to do range comparisons, behavior remains the same for all inputs.
Perf for that method alone is ~10-15x,
Perf for "scheme:" + { '\ud83f', '\udffe' } * 1000
Method | Toolchain | Mean | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
NewUr1 | clean\CoreRun.exe | 345.1 us | 1.31 | 69.3359 | 13.6719 | - | 285.44 KB |
NewUr1 | new\CoreRun.exe | 270.0 us | 1.00 | 62.0117 | 12.2070 | - | 254.19 KB |
The majority of the clauses in the if
statement aren't necessary. For instance, here is untested optimized code:
// This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2 internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery) { bool inRange = false; surrogatePair = false;
Debug.Assert(char.IsHighSurrogate(highSurr));
if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
{
surrogatePair = true;
// U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
// U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
// U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).
inRange = ((ushort)rune.Value < 0xFFFE)
&& ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
&& (isQuery || rune.Value < 0xF0000);
}
return inRange;
}
Should
Rune.TryCreate
be prefered overchar.IsLowSurrogate
andchar.ConvertToUtf32
? As far as I can tell they are effectively identical.
The only real difference is that the Rune.TryCreate
performs the check once (as it's a single method call), while char.IsLowSurrogate
and char.ConvertToUtf32
perform the check twice.
Co-Authored-By: Stephen Toub stoub@microsoft.com
I used the optimization for similar range checks @GrabYourPitchforks suggested, only changing the cast to ushort to AND 0xFFFF.
I verified that the behaviour for all inputs is the same.
ghost locked as resolved and limited conversation to collaborators