<regex>
: wregex
with regular expression [\w\s]
fails to match some spaces · Issue #5243 · microsoft/STL (original) (raw)
The regular expression [\w\s]
fails to match whitespace characters with code points > 255.
Test case
#include #include
using namespace std;
int main() { const wregex re1(LR"([\s])"); const wregex re2(LR"([\w\s])"); cout << R"(U+0020 SPACE is matched by "[\s]": )" << regex_match(L" ", re1) << '\n'; cout << R"(U+0020 SPACE is matched by "[\w\s]": )" << regex_match(L" ", re2) << '\n'; cout << R"(U+2028 LINE SEPARATOR is matched by "[\s]": )" << regex_match(L"\u2028", re1) << '\n'; cout << R"(U+2028 LINE SEPARATOR is matched by "[\w\s]": )" << regex_match(L"\u2028", re2) << '\n'; }
https://godbolt.org/z/oEdTs3Th4
This prints:
U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 0
Expected result
This should print:
U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 1
Additional remarks
The underlying cause is #5242. But while I consider fixing #5242 ABI-breaking, I think this issue can be fixed without breaking ABI.