<regex>: Avoid stack growth in simple loops by muellerj2 · Pull Request #5939 · microsoft/STL (original) (raw)
To recap from #5889, simple loops have the following properties:
- They are non-reentrant.
- They are branchless.
- Each repetition matches strings of the same length.
- Each capturing group matched in a repetition has the same relative position to the beginning and end of the strings matched by the repetition.
The matcher has always used properties 1 and 2 of such loops. It also took slight advantage of property 3: As a corollary, all repetitions match the empty string iff the first repetition matches the empty string, so the matcher only checked whether the first repetition is empty.
But properties 3 and 4 can be exploited further: If we know that each successful repetition shifts the matched string and the capturing groups by the same distance, we do not have to explicitly store these positions on the stack for unwinding greedy matching but can restore the positions while backtracking by shifting the positions in the other direction by the same amount. As for non-greedy matching, failing to match the next repetition will immediately result in backtracking beyond the first repetition, so we actually do not even have to know the length of the strings matched by each repetition, but only have to allow backtracking to proceed for the first stack frames that were pushed while matching the first repetition.
At least for greedy matching, though, we can't easily avoid that these stack frames are pushed while matching the next repetition, because they are still needed to restore the match state when backtracking from the last attempted match.
This PR implements that stack frames pushed while matching a repetition are popped afterwards from the stack without any further processing from the second repetition on. Thus, the stack stops growing while the matcher processes the simple loop.
This is probably the most intricate PR since the start of the non-recursive matcher PR, because it keeps tampering with this stack and does not just pop from the stack, but even repeatedly modifies two special stack frames that were pushed earlier while processing the loop's _N_rep and _N_end_rep nodes. However, I believe the performance benefit for simple loops is worth this complication (especially because I hope to extend this optimization to even more loops that are currently not marked simple).
The two special stack frames (that can be recognized in the code by the assignment of opcode _Do_nothing to them in some cases) are used as follows:
- When the loop is entered, the first stack frame (at this point identified by index
_Loop_vals[_Node->_Loop_number]._Loop_frame_idx) stores the initial position in the searched string at the start of the loop. If matching is greedy and the minimum number of repetitions is zero, the opcode is_Loop_simple_greedy_firstrep(to set up tail matching if even matching the first repetition fails), otherwise it is_Do_nothing. Backtracking during non-greedy matching on the other hand is handled by pushing an additional stack frame with opcode_Loop_simple_nongreedy. - After the first repetition matched successfully, a second stack frame is pushed. The first stack frame's index is now stored in the second stack frame's
_Loop_frame_idx_savmember, while_Loop_vals[_Nr->_Loop_number]._Loop_frame_idxpoints to the second stack frame. The second stack frame's iterator is generally changed to point to the current position in the input string (except when its opcode is_Do_nothing). The frame's code is assigned as follows:- If the loop is matched greedily and the minimum number of repetitions is zero (i.e., if the first frame has opcode
_Loop_simple_greedy_firstrepassigned), the code of the second frame is initialized to_Loop_simple_greedy_lastrep. - If the minimum number of repetitions has not been reached, the second frame has opcode
_Do_nothing. - If the minimum number of repetitions is reached and the loop is matched greedily, the opcode is changed to
_Loop_simple_greedy_lastrep. - If the minimum number of repetitions is reached and the loop is matched non-greedily, the opcode is changed to
_Loop_simple_nongreedy(each time, because the contents might have been overwritten in-between).
Meanwhile, the position in the first stack frame is used during backtracking from greedy matching to indicate when we have backtracked beyond the second repetition or the minimum number of repetitions. It is set to the start position of the first repetition or the repetition whose match resulted in reaching the minimum. If these positions in the input string are reached while backtracking successive repetitions of the loop, the backtracking logic for non-initial repetitions is stopped and the normal stack unwinding logic is allowed to proceed again.
- If the loop is matched greedily and the minimum number of repetitions is zero (i.e., if the first frame has opcode
Because positions must be shifted back during greedy matching, the iterators of the input string must be decremented during backtracking (to either calculate the position where the previous repetition stopped or to move the start and end positions of capturing groups accordingly). The standard requires that provided iterators must be bidirectional, so the matcher must always be able to perform such decrements. But I think the matcher has only required forward iterators in practice before this PR and I think the matcher will enter an endless after this PR if assertions are disabled (because std::advance() will just not shift the iterator by a negative distance if the iterator isn't bidirectional). For this reason, this PR also adds static assertions checking the bidirectional iterator requirement.
Individual changes
- Add assertions checking the bidi iterator requirement to
regex_match(),regex_search()andregex_replace(). - Add a new member
_Rep_lengthto (renamed)_Loop_vals_v3_t, which will hold the length of the first (and thus every) repetition for simple loops after matching the first repetition. The storage is now templated on the difference type of the input string iterator. - Split the opcode for greedy simple loops into three: One for backtracking from the first repetition, one for backtracking from the last attempted repetition (which is not the first one) and one for backtracking for any intermediate repetition. These cases have to be handled differently now:
- Backtracking from the initial repetition keeps the same logic as before.
- Backtracking from the last attempted repetition now has to additionally push another stack frame again for backtracking the prior repetition, if the prior repetition is not the first or minimum repetition.
- Backtracking from the intermediate repetition additionally has to shift the start and end positions of capturing groups by the length of the loop.
- In the handler of
_N_rep, merge the_Loop_simple_greedystack frame into the previously pushed stack frame with code_Do_nothingby changing the code of the former to_Loop_simple_greedy_firstrep. - In the handler of
_N_end_repfor simple loops:- After the first repetition, determine the length of this (and thus every) repetition. Perform the original logic and exit the handler if this length is zero. If not, push a new special stack frame and store its position in
_Sav._Loop_frame_idx(while storing the position of the first special one to_Frame._Loop_frame_idxin this second frame). The code of the second stack frame is initialized to_Loop_simple_greedy_lastrepif the first one's code is_Loop_simple_greedy_firstrep(i.e., backtracking from greedy matching might happen until the very first repetition), else it's set to_Do_nothingfor now. - After any following repetition, pop all stack frames pushed while matching this repetition by setting
_Frames_countto_Sav._Loop_frame_idx + 1, keeping only the second special frame around. - If greedy matching is performed:
* Check if this repetition reached the minimum number of repetitions (which is the case if the second stack frame's code hasn't been changed from_Do_nothingyet). If so, set the iterator of the first special stack frame (with code_Do_nothingas well) to the start of the prior repetition and change the second stack frame's code to_Loop_simple_greedy_lastrep.
* Update the second stack frame's iterator to the current position in the input string.
* If the maximum hasn't been reached yet, increment the loop counter and set the next node pointer to the start node of the loop.
* If the maximum has been reached, set up tail matching. If this is backtracked from, we are essentially handling the repetition before the last one (and thus have to shift the capturing groups), so the code in the second special stack frame has to be changed to_Loop_simple_greedy_intermediaterep. - If non-greedy matching is performed:
* Set up the second special stack frame for non-greedy unwinding. (We know this stack frame must exist in the_Framesvector, so we can avoid calling_Push_frame(). However, we have to reset its members as necessary because its contents might have been overwritten.)
- After the first repetition, determine the length of this (and thus every) repetition. Perform the original logic and exit the handler if this length is zero. If not, push a new special stack frame and store its position in
- In the stack unwinding loop:
- Make the handler of
_Loop_simple_greedythe one for_Loop_simple_greedy_firstrep. - Copy the logic of
_Loop_simple_greedy_lastrepfor the handler of_Loop_simple_greedy_lastrepand add the logic to set up the stack frame for unwinding to the prior repetition in case of match failure. - Put the handler for
_Loop_simple_greedy_intermediaterepbefore_Loop_simple_greedy_lastrep, add code to shift the start and end iterators of the capturing groups. The capturing groups matched by each repetition are identified by walking the stack frames between the first and second special stack frame. After adjusting the capturing groups, fall through to the_Loop_simple_greedy_lastrephandler.
- Make the handler of
Tests
The tests verify that backtracking from loops still works and capturing groups are set correctly despite these intricate stack manipulations. Backreferences are used to verify the contents of the capturing groups.
In the non-greedy case, failing to match a single repetition means that the loop is backtracked from completely. This is why a single test case verifying that the capturing group is unmatched is sufficient here.
In the greedy case, we have to verify that three different opcodes are handled correctly during unwinding, and backtracking after failing the last attempted repetition might stop at any repetition in-between. Moreover, special handling is necessary when the maximum number of repetitions is reached or when backtracking beyond the second or the minimum repetition. The tests are chosen to provide coverage for all these cases.
Benchmark
| benchmark | before [ns] | after [ns] | speedup |
|---|---|---|---|
| bm_match_sequence_of_as/"a*"/100 | 2148.44 | 1286.97 | 1.67 |
| bm_match_sequence_of_as/"a*"/200 | 3379.61 | 2343.75 | 1.44 |
| bm_match_sequence_of_as/"a*"/400 | 5580.36 | 4425.92 | 1.26 |
| bm_match_sequence_of_as/"a*?"/100 | 1967.08 | 2040.32 | 0.96 |
| bm_match_sequence_of_as/"a*?"/200 | 3717.91 | 3683.04 | 1.01 |
| bm_match_sequence_of_as/"a*?"/400 | 6835.94 | 6975.45 | 0.98 |
| bm_match_sequence_of_as/"(?:a)*"/100 | 2622.77 | 1757.81 | 1.49 |
| bm_match_sequence_of_as/"(?:a)*"/200 | 4237.58 | 3247.08 | 1.31 |
| bm_match_sequence_of_as/"(?:a)*"/400 | 7952.01 | 5998.88 | 1.33 |
| bm_match_sequence_of_as/"(a)*"/100 | 3989.95 | 2786.7 | 1.43 |
| bm_match_sequence_of_as/"(a)*"/200 | 6835.94 | 5312.5 | 1.29 |
| bm_match_sequence_of_as/"(a)*"/400 | 32994.1 | 9416.81 | 3.50 |
| bm_match_sequence_of_as/"(a)*?"/100 | 4541.02 | 3288.92 | 1.38 |
| bm_match_sequence_of_as/"(a)*?"/200 | 7847.38 | 6417.41 | 1.22 |
| bm_match_sequence_of_as/"(a)*?"/400 | 20402.9 | 11474.6 | 1.78 |
| bm_match_sequence_of_as/"(?:b|a)*"/100 | 3923.69 | 4589.84 | 0.85 |
| bm_match_sequence_of_as/"(?:b|a)*"/200 | 7149.83 | 8021.76 | 0.89 |
| bm_match_sequence_of_as/"(?:b|a)*"/400 | 13183.5 | 15066.9 | 0.87 |
| bm_match_sequence_of_as/"(b|a)*"/100 | 6417.41 | 6835.94 | 0.94 |
| bm_match_sequence_of_as/"(b|a)*"/200 | 16043.5 | 20996.1 | 0.76 |
| bm_match_sequence_of_as/"(b|a)*"/400 | 53013.4 | 52550.4 | 1.01 |
| bm_match_sequence_of_as/"(a)(?:b|a)*"/100 | 4464.29 | 4499.17 | 0.99 |
| bm_match_sequence_of_as/"(a)(?:b|a)*"/200 | 7672.99 | 8196.15 | 0.94 |
| bm_match_sequence_of_as/"(a)(?:b|a)*"/400 | 14125.2 | 14997.2 | 0.94 |
| bm_match_sequence_of_as/"(a)(b|a)*"/100 | 6406.25 | 6406.25 | 1.00 |
| bm_match_sequence_of_as/"(a)(b|a)*"/200 | 14648.4 | 14125.2 | 1.04 |
| bm_match_sequence_of_as/"(a)(b|a)*"/400 | 53013.4 | 56250 | 0.94 |
| bm_match_sequence_of_as/"(a)(?:b|a)*c"/100 | 5161.83 | 5859.38 | 0.88 |
| bm_match_sequence_of_as/"(a)(?:b|a)*c"/200 | 10253.9 | 9835.34 | 1.04 |
| bm_match_sequence_of_as/"(a)(?:b|a)*c"/400 | 18415.3 | 18415.3 | 1.00 |