<regex>
: regex_traits::transform_primary
should yield primary sort keys appropriate for the imbued locale by muellerj2 · Pull Request #5444 · microsoft/STL (original) (raw)
The actual work is done in two new functions __std_regex_transform_primary_char/wchar_t
, which are basically 1:1 copies of _Strxfrm()
and _Wcsxfrm()
but pass different flags to __crtLCMapStringA/W
. I also took the liberty to correct the SAL annotations.
__crtLCMapStringA/W
are declared in awint.hpp
which includes yvals.h
. I'm uncertain if this is the best approach, but I undefined _ENFORCE_ONLY_CORE_HEADERS
so that awint.hpp
can be included.
transform_primary
has to check the types of the collate facets using RTTI, so I made the function always returns an empty string when dynamic RTTI is disabled/_CPPRTTI
is undefined. The implementation itself is heavily based on collate::do_transform
(including the change in #5431). It also needs access to the internals of collate
, so I made _Regex_traits
a friend of it.
There is a behavior change for the C locale: As I explained in more detail in #5435, the traits requirement in [re.req]/20 is actually misleading, since it is wrong for precisely one locale: the C locale (or the POSIX locale, see the collation order definition here: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02_06). Since the equivalence classes are derived from POSIX and the definition of regex_traits::transform_primary
also alludes to "primary sort keys" which indirectly reference terminology from the POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02), I think we should do as POSIX says: "A" should not match [[=a=]]
.
This has consequences:
- When I implemented : Properly parse and match collating symbols and equivalences #5392, I assumed [re.req]/20, so I didn't add any character translation using
translate
andtranslate_nocase
when parsing equivalences. Now we have to add such logic in_Parser::_Do_ex_class2
to handle potentially case-sensitive sort keys when case-insensitive regexes are used (else "A" would even fail to match[[=A=]]
). - A number of test cases (some of my own making) failed, because they all assumed that lower and upper case characters are equivalent in the C locale.
Since matching and parsing of equivalences no longer go through collate::transform
, related tests no longer have to be skipped under IDL mismatch.