Optimize CheckIriUnicodeRange by MihaZupan · Pull Request #31860 · dotnet/runtime (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation17 Commits6 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

MihaZupan

Avoid using an intermediate string to do range comparisons, behavior remains the same for all inputs.

Perf for that method alone is ~10-15x,

Perf for "scheme:" + { '\ud83f', '\udffe' } * 1000

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
NewUr1 clean\CoreRun.exe 345.1 us 1.31 69.3359 13.6719 - 285.44 KB
NewUr1 new\CoreRun.exe 270.0 us 1.00 62.0117 12.2070 - 254.19 KB

stephentoub

stephentoub

@EgorBo

@GrabYourPitchforks

The majority of the clauses in the if statement aren't necessary. For instance, here is untested optimized code:

// This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2 internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery) { bool inRange = false; surrogatePair = false;

Debug.Assert(char.IsHighSurrogate(highSurr));

if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
{
    surrogatePair = true;

    // U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
    // U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
    // U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).

    inRange = ((ushort)rune.Value < 0xFFFE)
        && ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
        && (isQuery || rune.Value < 0xF0000);
}

return inRange;

}

@MihaZupan

@GrabYourPitchforks

Should Rune.TryCreate be prefered over char.IsLowSurrogate and char.ConvertToUtf32? As far as I can tell they are effectively identical.

The only real difference is that the Rune.TryCreate performs the check once (as it's a single method call), while char.IsLowSurrogate and char.ConvertToUtf32 perform the check twice.

@MihaZupan

@MihaZupan @stephentoub

Co-Authored-By: Stephen Toub stoub@microsoft.com

@MihaZupan

@MihaZupan

@MihaZupan

I used the optimization for similar range checks @GrabYourPitchforks suggested, only changing the cast to ushort to AND 0xFFFF.

I verified that the behaviour for all inputs is the same.

GrabYourPitchforks

@MihaZupan

@MihaZupan

@ghost ghost locked as resolved and limited conversation to collaborators

Dec 10, 2020