<regex>: Avoid stack growth in simple loops by muellerj2 · Pull Request #5939 · microsoft/STL (original) (raw)

To recap from #5889, simple loops have the following properties:

  1. They are non-reentrant.
  2. They are branchless.
  3. Each repetition matches strings of the same length.
  4. Each capturing group matched in a repetition has the same relative position to the beginning and end of the strings matched by the repetition.

The matcher has always used properties 1 and 2 of such loops. It also took slight advantage of property 3: As a corollary, all repetitions match the empty string iff the first repetition matches the empty string, so the matcher only checked whether the first repetition is empty.

But properties 3 and 4 can be exploited further: If we know that each successful repetition shifts the matched string and the capturing groups by the same distance, we do not have to explicitly store these positions on the stack for unwinding greedy matching but can restore the positions while backtracking by shifting the positions in the other direction by the same amount. As for non-greedy matching, failing to match the next repetition will immediately result in backtracking beyond the first repetition, so we actually do not even have to know the length of the strings matched by each repetition, but only have to allow backtracking to proceed for the first stack frames that were pushed while matching the first repetition.

At least for greedy matching, though, we can't easily avoid that these stack frames are pushed while matching the next repetition, because they are still needed to restore the match state when backtracking from the last attempted match.

This PR implements that stack frames pushed while matching a repetition are popped afterwards from the stack without any further processing from the second repetition on. Thus, the stack stops growing while the matcher processes the simple loop.

This is probably the most intricate PR since the start of the non-recursive matcher PR, because it keeps tampering with this stack and does not just pop from the stack, but even repeatedly modifies two special stack frames that were pushed earlier while processing the loop's _N_rep and _N_end_rep nodes. However, I believe the performance benefit for simple loops is worth this complication (especially because I hope to extend this optimization to even more loops that are currently not marked simple).

The two special stack frames (that can be recognized in the code by the assignment of opcode _Do_nothing to them in some cases) are used as follows:

Because positions must be shifted back during greedy matching, the iterators of the input string must be decremented during backtracking (to either calculate the position where the previous repetition stopped or to move the start and end positions of capturing groups accordingly). The standard requires that provided iterators must be bidirectional, so the matcher must always be able to perform such decrements. But I think the matcher has only required forward iterators in practice before this PR and I think the matcher will enter an endless after this PR if assertions are disabled (because std::advance() will just not shift the iterator by a negative distance if the iterator isn't bidirectional). For this reason, this PR also adds static assertions checking the bidirectional iterator requirement.

Individual changes

Tests

The tests verify that backtracking from loops still works and capturing groups are set correctly despite these intricate stack manipulations. Backreferences are used to verify the contents of the capturing groups.

In the non-greedy case, failing to match a single repetition means that the loop is backtracked from completely. This is why a single test case verifying that the capturing group is unmatched is sufficient here.

In the greedy case, we have to verify that three different opcodes are handled correctly during unwinding, and backtracking after failing the last attempted repetition might stop at any repetition in-between. Moreover, special handling is necessary when the maximum number of repetitions is reached or when backtracking beyond the second or the minimum repetition. The tests are chosen to provide coverage for all these cases.

Benchmark

benchmark before [ns] after [ns] speedup
bm_match_sequence_of_as/"a*"/100 2148.44 1286.97 1.67
bm_match_sequence_of_as/"a*"/200 3379.61 2343.75 1.44
bm_match_sequence_of_as/"a*"/400 5580.36 4425.92 1.26
bm_match_sequence_of_as/"a*?"/100 1967.08 2040.32 0.96
bm_match_sequence_of_as/"a*?"/200 3717.91 3683.04 1.01
bm_match_sequence_of_as/"a*?"/400 6835.94 6975.45 0.98
bm_match_sequence_of_as/"(?:a)*"/100 2622.77 1757.81 1.49
bm_match_sequence_of_as/"(?:a)*"/200 4237.58 3247.08 1.31
bm_match_sequence_of_as/"(?:a)*"/400 7952.01 5998.88 1.33
bm_match_sequence_of_as/"(a)*"/100 3989.95 2786.7 1.43
bm_match_sequence_of_as/"(a)*"/200 6835.94 5312.5 1.29
bm_match_sequence_of_as/"(a)*"/400 32994.1 9416.81 3.50
bm_match_sequence_of_as/"(a)*?"/100 4541.02 3288.92 1.38
bm_match_sequence_of_as/"(a)*?"/200 7847.38 6417.41 1.22
bm_match_sequence_of_as/"(a)*?"/400 20402.9 11474.6 1.78
bm_match_sequence_of_as/"(?:b|a)*"/100 3923.69 4589.84 0.85
bm_match_sequence_of_as/"(?:b|a)*"/200 7149.83 8021.76 0.89
bm_match_sequence_of_as/"(?:b|a)*"/400 13183.5 15066.9 0.87
bm_match_sequence_of_as/"(b|a)*"/100 6417.41 6835.94 0.94
bm_match_sequence_of_as/"(b|a)*"/200 16043.5 20996.1 0.76
bm_match_sequence_of_as/"(b|a)*"/400 53013.4 52550.4 1.01
bm_match_sequence_of_as/"(a)(?:b|a)*"/100 4464.29 4499.17 0.99
bm_match_sequence_of_as/"(a)(?:b|a)*"/200 7672.99 8196.15 0.94
bm_match_sequence_of_as/"(a)(?:b|a)*"/400 14125.2 14997.2 0.94
bm_match_sequence_of_as/"(a)(b|a)*"/100 6406.25 6406.25 1.00
bm_match_sequence_of_as/"(a)(b|a)*"/200 14648.4 14125.2 1.04
bm_match_sequence_of_as/"(a)(b|a)*"/400 53013.4 56250 0.94
bm_match_sequence_of_as/"(a)(?:b|a)*c"/100 5161.83 5859.38 0.88
bm_match_sequence_of_as/"(a)(?:b|a)*c"/200 10253.9 9835.34 1.04
bm_match_sequence_of_as/"(a)(?:b|a)*c"/400 18415.3 18415.3 1.00