<regex>: Process disjunctions non-recursively by muellerj2 · Pull Request #5745 · microsoft/STL (original) (raw)

Towards #997 and #1528.

We have to handle disjunctions (represented by _N_if nodes) differently depending on longest mode: When _Longest is true, we have to evaluate all alternatives during unwinding no matter if they matched successfully or not. Otherwise, we always may (and usually must) stop evaluating alternatives after one succeeded. The former is implemented by the case _Disjunction_eval_alt_always, the latter by _Disjunction_eval_alt_on_failure. We can implement the latter case by falling through to the former.

During unwinding, _Frame already holds the correct values to evaluate further alternatives in the disjunctions except for the NFA node it points to. So we can just update the node pointer and "push" _Frame on the stack again by incrementing _Frames_count.

In longest mode, _Match_pat() now returns false when matching the last evaluated alternative failed, even if some alternative evaluated earlier matched successfully. _Matcher3::_Match() can deal with this behavior change by checking the member _Matched as well, which is set to true in longest mode when the first successful match was found. (Note that in longest mode, the return value of _Match_pat() mostly doesn't matter when possible trajectories are evaluated by the matcher. The only exceptions are the _Match_pat(_Node->_Next) calls found in _Do_rep0(), which handles simple loops. But the return values of these calls are unaffected by this change in the evaluation of _N_if nodes because simple loops are guaranteed to be branchless.)