<regex>
: Fix character range bounds in case-insensitive regexes by muellerj2 · Pull Request #5164 · microsoft/STL (original) (raw)
This PR deals with three related problems for character ranges in case-insensitive mode:
_Builder::_Add_range()
casts the character bounds tounsigned int
. As a consequence, characters with negative numeric values are not added to the bitmap, but rather to the_Large
list of characters. This means that these characters are not found during matching. (Note the suspiciously different casts in the else branch for the case-sensitive case.)- The parser fails to reject some empty ranges in case-insensitive mode such as
[Z-a]
(=[z-a]
). - When both the
collate
andicase
flags are set, there is an unnecessary call to translate the bounds by_Traits.translate()
first before passing them to_Traits.translate_nocase()
. The standard says in [re.grammar]/14.1 and 14.2 that it is sufficient to calltranslate_nocase()
only. (See also_Builder::_Add_char()
, which already follows the Standard in this regard.)
The PR moves the entire character translation into the parser so that empty ranges can be reliably diagnosed there in case-insensitive mode as well. It also fixes the unsigned cast and removes the unnecessary translate()
call.
The test deliberately does not use any manual signed/unsigned casts, but leaves all of these casts to char_traits
to avoid getting the casts similarly wrong in <regex>
and the test.