<regex>: Avoid generating unnecessary single-branch _N_if nodes by muellerj2 · Pull Request #5539 · microsoft/STL (original) (raw)

The parser likes to generate single-branch _N_if nodes in the NFA for no good reason. These nodes have a noticeable runtime cost in the matcher. This PR avoids generating these nodes.

This change has side effects:

Without further changes, there would be more side effects because some loops would get classified as simple now. However, I preempted this reclassification by adjusting _Parser2::_Calculate_loop_simplicity(). There are two reasons for this.

I added no new tests because (a) there is no obvious additional set of regexes to test and (b) this change affects the generated NFAs for all regexes without exception (all of them contained at least one single-branch _N_if node before), so the whole existing test suite serves as test coverage for this change.

Benchmark

Benchmark Before After Speedup
bm_lorem_search/"bibe"/2 31145 ns 28878 ns 1.08
bm_lorem_search/"bibe"/3 61384 ns 57199 ns 1.07
bm_lorem_search/"bibe"/4 122768 ns 114397 ns 1.07
bm_lorem_search/"(bibe)"/2 57812 ns 43945 ns 1.32
bm_lorem_search/"(bibe)"/3 141246 ns 92072 ns 1.53
bm_lorem_search/"(bibe)"/4 284934 ns 187976 ns 1.52
bm_lorem_search/"(bibe)+"/2 92072 ns 62779 ns 1.47
bm_lorem_search/"(bibe)+"/3 179983 ns 122768 ns 1.47
bm_lorem_search/"(bibe)+"/4 376607 ns 256319 ns 1.47
bm_lorem_search/"(?:bibe)+"/2 55804 ns 48652 ns 1.15
bm_lorem_search/"(?:bibe)+"/3 112305 ns 96257 ns 1.17
bm_lorem_search/"(?:bibe)+"/4 224609 ns 196725 ns 1.14