Reduce unnecessary Regex match attempts for expressions beginning with atomic loops by stephentoub · Pull Request #35824 · dotnet/runtime (original) (raw)

Optimize Regex expressions that begin with atomic unbounded loops (either as written by the dev or because the system detected the loop could be made atomic and did it implicitly) by updating the starting position in the scan loop for the next iteration to be where the loop ended rather than where it started.

Running the examples from https://github.com/mariomka/regex-benchmark/blob/652d55810691ad88e1c2292a2646d301d3928903/csharp/Benchmark.cs#L20-L26:

Before (RegexOptions.None):

558.089 - 92
380.8639 - 5301
42.2791 - 5

After (RegexOptions.None):

195.73 - 92
97.2725 - 5301
42.3631 - 5

Before (RegexOptions.ECMAScript):

435.6631 - 92
299.1351 - 5301
41.6416 - 5

After (RegexOptions.ECMAScript):

184.59 - 92
89.0295 - 5301
42.6699 - 5

Before (RegexOptions.Compiled):

279.0234 - 92
186.4688 - 5301
16.3892 - 5

After (RegexOptions.Compiled):

137.4005 - 92
57.136 - 5301
15.6309 - 5

Before (RegexOptions.Compiled | RegexOptions.ECMAScript):

204.9634 - 92
113.7158 - 5301
15.7937 - 5

After (RegexOptions.Compiled | RegexOptions.ECMAScript):

127.83 - 92
49.2225 - 5301
15.7834 - 5

@danmosemsft, this is the optimization you and I discussed offline.

cc: @eerhardt, @pgovind