<regex>: wregex with regular expression [\w\s] fails to match some spaces · Issue #5243 · microsoft/STL (original) (raw)

The regular expression [\w\s] fails to match whitespace characters with code points > 255.

Test case

#include #include

using namespace std;

int main() { const wregex re1(LR"([\s])"); const wregex re2(LR"([\w\s])"); cout << R"(U+0020 SPACE is matched by "[\s]": )" << regex_match(L" ", re1) << '\n'; cout << R"(U+0020 SPACE is matched by "[\w\s]": )" << regex_match(L" ", re2) << '\n'; cout << R"(U+2028 LINE SEPARATOR is matched by "[\s]": )" << regex_match(L"\u2028", re1) << '\n'; cout << R"(U+2028 LINE SEPARATOR is matched by "[\w\s]": )" << regex_match(L"\u2028", re2) << '\n'; }

https://godbolt.org/z/oEdTs3Th4

This prints:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 0

Expected result

This should print:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 1

Additional remarks

The underlying cause is #5242. But while I consider fixing #5242 ABI-breaking, I think this issue can be fixed without breaking ABI.