STL (original) (raw)

This PR deals with three related problems for character ranges in case-insensitive mode:

_Builder::_Add_range() casts the character bounds to unsigned int. As a consequence, characters with negative numeric values are not added to the bitmap, but rather to the _Large list of characters. This means that these characters are not found during matching. (Note the suspiciously different casts in the else branch for the case-sensitive case.)
The parser fails to reject some empty ranges in case-insensitive mode such as [Z-a] (= [z-a]).
When both the collate and icase flags are set, there is an unnecessary call to translate the bounds by _Traits.translate() first before passing them to _Traits.translate_nocase(). The standard says in [re.grammar]/14.1 and 14.2 that it is sufficient to call translate_nocase() only. (See also _Builder::_Add_char(), which already follows the Standard in this regard.)

The PR moves the entire character translation into the parser so that empty ranges can be reliably diagnosed there in case-insensitive mode as well. It also fixes the unsigned cast and removes the unnecessary translate() call.

The test deliberately does not use any manual signed/unsigned casts, but leaves all of these casts to char_traits to avoid getting the casts similarly wrong in <regex> and the test.

<regex>: Fix character range bounds in case-insensitive regexes by muellerj2 · Pull Request #5164 · microsoft/STL (original) (raw)

`<regex>`: Fix character range bounds in case-insensitive regexes by muellerj2 · Pull Request #5164 · microsoft/STL (original) (raw)