<regex>
: Simplify matching of _N_if
NFA nodes in leftmost-longest mode by muellerj2 · Pull Request #5405 · microsoft/STL (original) (raw)
Since we settled on some reasonable semantics for leftmost-longest matching in #5218, I think we should remove the code for some other (abandoned) attempt to implement the leftmost-longest rule in _Matcher::_Do_if
: An attempt to set Tgt_state
to the leftmost-longest match found under this _N_if
node.
This is neither necessary (in leftmost-longest mode, we take the final result from _Res
, not _Tgt_state
) nor sufficient (it assigns one of the longest matches to _Tgt_state
, but if there are several of the same length it doesn't necessarily pick the correct one among these matches).
On the other hand, this saves 96 bytes of stack space per call to _Do_if
in debug mode, slightly alleviating the stack overflow issues (#997, #1528).
Potentially leaving _Tgt_state
in a garbage state is fine here, because none of the functions further up on the callstack rely on its value:
_Match_pat
in the_N_if
case: Will just return immediately to its caller without changing_Tgt_state
._Do_rep0
: Can't be a caller of_Match_pat
because a repetition containing an_N_if
node anywhere is not simple._Do_assert
/_Do_neg_assert
: Can't be the callers of_Match_pat
because there are no lookahead assertions in the POSIX grammars that demand application of the leftmost-longest rule. (But even if there were lookahead assertions -- notwithstanding their currently unknown semantics -- it would be fine in the sense that the matcher wouldn't crash:_Do_assert
would reset the position pointer to a savepoint, while_Do_neg_assert
would fail, resulting in_Match_pat
returning immediately to its caller so that the remaining analysis here applies. Even so, there is the issue that_Do_assert
would not reset the capture groups in_Tgt_state
-- but we don't know what to set them to either as long as we don't know the semantics of such assertions. Nevertheless, all "valid" capture groups would still point to legal ranges in the input, so even the matcher with this PR's change wouldn't crash. This means we wouldn't have to worry about a newer parser emiting assertion nodes because running them with the old matcher would at worst produce wrong results. To get correct semantics, an updated parser and matcher are required, but this would also be the case if we didn't do this PR's change.)- Another
_Do_if
: Will either reset_Tgt_state
to some savepoint before calling_Match_pat
or immediately return to its caller and leave_Tgt_state
as-is. _Do_rep
: Will either reset_Tgt_state
to some savepoint before doing the next_Match_pat
call or return to its caller while leaving_Tgt_state
as-is._Match_pat
in the_N_rep
or_N_end_rep
cases: Will just return immediately to its caller without changing_Tgt_state
._Match
: Will evaluate_Res
in leftmost-longest mode, not_Tgt_state
. (_Match
only evaluted_Res
before : Fix depth-first and leftmost-longest matching rules #5218 as well, so this change also doesn't pose a problem if old and new functions are mixed.)
So in all cases, either Tgt_state
isn't evaluated anymore or it is reset to some savepoint before it is used again.