Fix and optimize EscapeUnescapeIri by MihaZupan · Pull Request #32025 · dotnet/runtime (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation37 Commits6 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Allocate the 4-byte buffer on the stack rather than on the heap.

Perf for "scheme:" + { '\ud83f', '\udffe' } * 1000 (same input as in #31860)

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	351.7 us	1.06	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	330.8 us	1.00	54.1992	-	-	222.94 KB

MihaZupan changed the title~~Uri cleanup bytearray alloc~~ Remove byte[] allocation per encoded character in Uri

Feb 10, 2020

}
for (int count = 0; count < encodedBytesCount; ++count)
{
UriHelper.EscapeAsciiChar((char)*(pEncodedBytes + count), ref dest);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this as an opportunity to get rid of some unsafe code and just use spans? e.g.

Span encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded]; int encodedBytesCount = Encoding.UTF8.GetBytes(new ReadOnlySpan(pInput + next, surrogatePair ? 2 : 1), encodedBytes); for (int count = 0; count < encodedBytesCount; count++) { UriHelper.EscapeAsciiChar((char)encodedBytes[i], ref dest); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a ~2% perf hit on the benchmark by doing so.
If we're okay with that I can make the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2% might be a measurement noise. I just ran in the same issue last week where the perf deviated by +- 2% between runs without any code changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the microbenchmark?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try running perf test multiple time and measure deviation. Additionally, you can set CPU affinity for the benchmark process it can stabilize results a bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're not actually measuring System.Private.Uri, but rather copying the source out into the benchmark? That's not going to be equivalent. For example, we explicitly clear the localsinit flag for all framework assemblies, but that won't happen for your code compiled into your benchmark, which means things like stackalloc are going to be more expensive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding.UTF8.GetBytes is a bit heavyweight for this. I'd instead recommend a slight variation of what @scalablecory recommended:

Span encodedBytes = stackalloc byte[MaxNumberOfBytesEncoded];

Rune rune = (surrogatePair) ? new Rune(pInput[next], pInput[next + 1]) : new Rune(pInput[next]); int encodedBytesCount = rune.EncodeToUtf8(encodedBytes); encodedBytes = encodedBytes.Slice(0, encodedBytesCount);

foreach (byte b in encodedBytes) { UriHelper.EscapeAsciiChar((char)b, ref dest); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing localsinit reduces the gap substantially.
Using @GrabYourPitchforks 's approach beats all above 👍

Method	Mean	Error	StdDev
Unsafe	33.68 us	0.401 us	0.356 us
Span	35.34 us	0.675 us	0.853 us
SpanSlice	35.23 us	1.043 us	0.871 us
Rune	28.13 us	0.547 us	0.512 us

I'll make the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To satisfy my curiosity, I added a benchmark with Rune.EncodeToUtf8 encoding to a byte*, avoiding spans. It performs ~8% better than the Rune benchmark above.
(I am not saying that I prefer it over the Rune & Span based one)

The CheckIriUnicodeRange method also performs an unnecessary allocation. That could be addressed (and the logic simplified greatly) via something akin to the following untested code:

// This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2 internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery) { bool inRange = false; surrogatePair = false;

Debug.Assert(char.IsHighSurrogate(highSurr));

if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
{
    surrogatePair = true;

    // U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
    // U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
    // U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).

    inRange = ((ushort)rune.Value < 0xFFFE)
        && ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
        && (isQuery || rune.Value < 0xF0000);
}

return inRange;

}

@GrabYourPitchforks Can you comment on #31860 regarding CheckIriUnicodeRange? Are you saying that the majority of those range checks are not needed?

Sorry, didn't see the other issue. Will copy the comment there. And yes, the majority of the checks are unnecessary.

MihaZupan changed the title~~Remove byte[] allocation per encoded character in Uri~~ Fix and optimize EscapeUnescapeIri

Feb 13, 2020

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results and hitting the fallback-path in Utf8Encoding (that allocates).

Correcting the bug and using Rune now shows much nicer numbers

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	315.9 us	2.47	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	127.8 us	1.00	19.0430	2.6855	-	78.41 KB

This also makes the improvement in #31860 more noticable. Combining the changes the numbers are:

Method	Toolchain	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NewUri	clean\CoreRun.exe	315.94 us	4.19	69.3359	13.6719	-	285.44 KB
NewUri	new\CoreRun.exe	75.56 us	1.00	11.4746	1.5869	-	47.16 KB

The time will likely improve a bit more when applying the change to range checks in #31860.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good.

Turns out EscapeUnescapeIri was not incrementing the index when escaping a surrogate pair. That led to the low surrogate being escaped again, producing wrong results

Did the CI tests not detect this regression? If there weren't any tests for this condition, will you add new tests to verify the correct behavior to avoid future regressions?

@davidsh The tests I added will catch this as well.
For example, this test will return %F0%9F%BF%BE%EF%BF%BD instead of %F0%9F%BF%BE (note the extra %EF%BF%BD at the end - percent encoded replacement char).

It appears there were no tests with a surrogate pair that wasn't in the IRI range before.

Test failures are unrelated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Also, wow -- check out the surrogate version of CheckIriUnicodeRange. It is bonkers!

jkotas added a commit that referenced this pull request

Feb 15, 2020

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

I have looked at the delta. I see an obvious bug with calling stackalloc in a loop that was caught by the failing tests.

Can static analyzers catch such cases?

Can static analyzers catch such cases?

Good idea. Added a note to #30740

@ghost ghost locked as resolved and limited conversation to collaborators

Dec 10, 2020