<regex>: Improve matcher's skip optimization (original) (raw)

Split from #5452. Other than the performance fix in #5457, I listed a few optimizations that could be investigated for _Skip in this comment:

I also mentioned dynamic programming/memoization as a potential alternative to breadth-first search in a follow-up comment:

We memoize the first possible match for each branch and only evaluate again when we move beyond this position. That means we need to retain some kind of map between nodes and positions. Recognizing if we moved beyond a memoized position is complicated by the fact that the iterators on the searched string might not be random access iterators.

But I no longer think this alternative is viable because we can't retain this cache between regex_search calls, so memoization is not sufficient to avoid quadratic running time in all cases when the skip optimization tests all possible branches.

In addition, this approach also penalizes cases where regex_search is only called once or so and the regex (but not the first branch in the regex) matches close to the start of the searched string. In this case, this approach spends up to a linear amount of time to test the first branch, even though most of the string doesn't have to be read at all.