<regex>: Use std::search() in skip heuristic by muellerj2 · Pull Request #5586 · microsoft/STL (original) (raw)
Towards #5468. This replaces the weird usage of _Cmp_chrange() by a straightforward call to std::search() in _Matcher2::_Skip().
I fix a copy-paste mistake in the comment describing _Cmp_icase_translateleft as well.
I also made an attempt to replace _Cmp_chrange()'s implementation by a straightforward call to std::mismatch(), but that seems to be a pessimization in practice of about 10 % (probably because the strings tend to be quite short).
There will still be one follow-up to make an obvious improvement to the skip heuristic for regex and wregex in collate mode. But otherwise, I think this is basically it for simple improvements to the skip heuristic. There are still a few opportunities that could lead to some improvement -- handling several branches, avoiding to walk the NFA for each _Skip() call, or avoiding to compare the NFA nodes matched by _Skip() in _Match_pat() again -- but they are not straightforward to implement.
Benchmark
| benchmark | before | after | speedup |
|---|---|---|---|
| bm_lorem_search/"^bibe"/2 | 28.2506 | 28.8783 | 0.98 |
| bm_lorem_search/"^bibe"/3 | 27.6228 | 29.82 | 0.93 |
| bm_lorem_search/"^bibe"/4 | 28.8783 | 32.4707 | 0.89 |
| bm_lorem_search/"bibe"/2 | 43492.7 | 2622.77 | 16.58 |
| bm_lorem_search/"bibe"/3 | 90680.8 | 5000 | 18.14 |
| bm_lorem_search/"bibe"/4 | 172631 | 9626.07 | 17.93 |
| bm_lorem_search/"(bibe)"/2 | 47538.5 | 4296.88 | 11.06 |
| bm_lorem_search/"(bibe)"/3 | 92071.8 | 8370.5 | 11.00 |
| bm_lorem_search/"(bibe)"/4 | 181370 | 15485.6 | 11.71 |
| bm_lorem_search/"(bibe)+"/2 | 64062.5 | 10253.9 | 6.25 |
| bm_lorem_search/"(bibe)+"/3 | 153460 | 20856.3 | 7.36 |
| bm_lorem_search/"(bibe)+"/4 | 249062 | 40108.8 | 6.21 |
| bm_lorem_search/"(?:bibe)+"/2 | 49178 | 4603.8 | 10.68 |
| bm_lorem_search/"(?:bibe)+"/3 | 94164.3 | 8998.29 | 10.46 |
| bm_lorem_search/"(?:bibe)+"/4 | 188354 | 17578.3 | 10.72 |
| bm_lorem_search/R"(\bbibe)"/2 | 96256.9 | 89979.2 | 1.07 |
| bm_lorem_search/R"(\bbibe)"/3 | 194972 | 188354 | 1.04 |
| bm_lorem_search/R"(\bbibe)"/4 | 374930 | 368369 | 1.02 |
| bm_lorem_search/R"(\Bibe)"/2 | 235395 | 222178 | 1.06 |
| bm_lorem_search/R"(\Bibe)"/3 | 404531 | 461498 | 0.88 |
| bm_lorem_search/R"(\Bibe)"/4 | 941265 | 983099 | 0.96 |
| bm_lorem_search/R"((?=....)bibe)"/2 | 48131.7 | 3138.95 | 15.33 |
| bm_lorem_search/R"((?=....)bibe)"/3 | 96256.9 | 6277.9 | 15.33 |
| bm_lorem_search/R"((?=....)bibe)"/4 | 179983 | 12207 | 14.74 |
| bm_lorem_search/R"((?=bibe)....)"/2 | 44327.8 | 2915.74 | 15.20 |
| bm_lorem_search/R"((?=bibe)....)"/3 | 87886.7 | 5580.36 | 15.75 |
| bm_lorem_search/R"((?=bibe)....)"/4 | 179983 | 10986.3 | 16.38 |
| bm_lorem_search/R"((?!lorem)bibe)"/2 | 45515.6 | 2999.44 | 15.17 |
| bm_lorem_search/R"((?!lorem)bibe)"/3 | 92071.8 | 5859.38 | 15.71 |
| bm_lorem_search/R"((?!lorem)bibe)"/4 | 188354 | 11160.7 | 16.88 |
Note that this means that all improvements since #5509 have sped up searching for the regex (bibe)+ by a factor of about 450 and for (?:bibe)+ by a factor of about 1000 in this benchmark.