<regex>
: Make capture groups in negative lookahead assertions always match nothing by muellerj2 · Pull Request #5366 · microsoft/STL (original) (raw)
Fixes #5245 and completes #5269.
Changes compared to #5269:
- I undid the last commit and undid an unrelated formatting change to the macro _REGEX_CHAR_CLASS_NAME.
- I separated the code for negative and positive lookahead assertions. This avoids an unnecessary memory allocation for positive lookahead assertions, as there is no need to store the match state of the capture groups in this case.
- I removed all state resets when the assertion results in match failure. Many other nodes also don't reset the state when they fail (and such a reset can't be made use of anyway, because this doesn't reset the state for any NFA nodes that were matched in previous iterations of the loop in
_Match_pat()
). - I moved the assertion tests to new member functions. This is mainly related to the stack overflow problems: IIRC,
_Match_pat()
used more than 1KB of stack per recursive call in debug builds when I checked a few weeks ago. The local variable_Bt_state<_It> _St
alone already consumes 56 bytes on x64 in debug mode (and 40 bytes in release mode). This should reduce stack usage in_Match_pat()
a bit, making stack overflows a bit less common. - I added a few simple tests to verify that the state resets are appropriate now.
The most important change -- which had already been applied in the second-to-last commit in #5269 -- is the assignment _Tgt_state = _St
when a negative lookahead assertion is successful. This resets the state of all capture groups, so in particular it resets the state of those capture groups that were matched at some point while processing the negative lookahead assertion. Before, only the position in the input string was reset via _Tgt_state._Cur = _Cur
. The same line can still be found in the code for positive lookahead assertions, but it's correct there: Capture group matches are meaningful for positive lookahead assertions and are to be retained according to the ECMAScript standard.