<regex>: Remove non-standard _Uelem from matcher by muellerj2 · Pull Request #5671 · microsoft/STL (original) (raw)
Resolves #995 by eliminating _Uelem completely.
This PR does not quite complete support for custom character types: The internals of regex_search still place some additional requirements on such types. But except for ADL resilience, the support should be complete for regex_match, as the passing test confirms.
<regex> changes
- Remove usage of
_Uelemfrom_Is_wordby converting tounsigned charinstead combined with a check that re-conversion to the character type yields the same character again.- At this opportunity,
_STD-qualify some_Is_wordcalls.
- At this opportunity,
- Remove usage of
_Uelemfrom_Lookup_rangeby relying on thelt()function in the char traits type instead.- Since this is possibly called in a tight loop, I added two overloads for
charandwchar_tthat avoid calling thelt()function for the standard traits type.
- Since this is possibly called in a tight loop, I added two overloads for
- Remove usage of
_Uelemfrom_Do_class()by converting tounsigned charinstead and checking that re-conversion to the character type yields the same character again. - Remove requirement for implicit conversions when comparing with ECMAScript line terminators by explicitly converting to
unsigned intbefore comparing with the line terminator code points.- The round trip conversion check makes sure that we don't accidentally confuse some code point values larger than 0x100000000 with the line terminators (although I don't know why anyone would ever need that).
- Signed character types are unproblematic: The code points for line terminators are positive values in all standard C++ signed types, so they will never be sign-extended when converting to
unsigned int. - Again, I added two overloads for
charandwchar_t, but in this case rather because their logic is noticeably simpler.
Test changes
The test coverage for custom character types largely remains rudimentary, but I added some test coverage that matching behaves as it should in places where the parser and matcher rely on potentially narrowing conversions.
The test covers the matcher changes in this PR: Word boundaries (_Is_word), single character matching (_Matcher2::_Do_class), character ranges (_Lookup_range) and line terminators (_Is_ecmascript_line_terminator). Additionally, some of the character ranges are chosen to validate that the implementation of _Builder::_Add_range remains correct for custom character types.
Beyond this, I made the following changes to tests:
- Removed workaround for : Suppress code analysis warning C6510 for basic_string #5563.
- Removed workarounds for : basic_regex wants regex_traits to provide things not required by [re.req] #995.
wrapped_wcharwas turned into a templatewrapped_character<Elem>so that it can be used with anunsigned long longcharacter type as well.- Removed
operator wchar_t()fromwrapped_character<Elem>and replaced it by friend functionsconvert_to<target_type>, which are now called by the test traits classes.- This tightens the test coverage for
<regex>as it removes implicit conversions from the custom types that only existed to support the implementation the test traits classes. - Since ADL doesn't work for function calls with explicit template parameters in C++14 and C++17, I had to move the definitions of the custom character types before the definitions of the traits classes.
- This tightens the test coverage for
- The test char traits class was changed to accomodate an
unsigned long long-like character type. - The implementation of
transform_primaryin the test regex traits was replaced by a dummy implementation to allow the test to run under /clr:pure.