Issue 35496: left-to-right violation in match order (original) (raw)
See attached script, which is self-explanatory.
I'm glad one of us thinks so, because I find it clear as mud.
I spent way longer on this than I should have, but I simplified your sample code to the best of my ability. (See attached.) As far as I can tell, your code and mine does roughly the same thing, but please check that you agree.
I agree that with the IPV6 portion of the regex removed, it matches on "208.123.4.22", but with the IPV6 portion included, it matches on "::ffff:208.123.4.22". But I'm not sure that's a bug. I think it is working as designed. For example:
py> import re py> text = 'green pepper' py> re.search('pepper|green pepper', text).group(0) 'green pepper'
seems to be analogous to your example, but simpler. Do you agree? If not, it would also help a lot if you could find a simpler regex that demonstrates the issue. See http://www.sscce.org/
In your case, I believe that the rightmost alternative matches from position 1 of the text, while the leftmost alternative doesn't match until position 8. So starting from position 0, the IPV6 check matches first, and so wins.
It is possible you were expecting that the IPV4 check would be tested against position 0, then position 1, then position 2, then ... and so on until the end of the string, and only then the IPV6 check tested against position 0, then 1 etc.
I'm very grateful for your time and attention, and sorry to have distracted you. You're correct when you say:
Steven D'Aprano: ...the rightmost alternative matches from position 1 of the text, while the leftmost alternative doesn't match until position 8. So starting from position 0, the IPV6 check matches first, and so wins.
I see now that what I was trying to do is simply not possible. I was looking for a way to do a kind of hat trick: to keep a matched substring ("::ffff:") out of matchObject.group(0). I guess I just don't get to do that.
It would be a nice feature to add: a "consume-and-forget" or "suppress" extension group type. Non-capturing groups forget about themselves, but they don't suppress their matched contents. It's a nice thing to be able to do because some software accepts regular expressions as configuration items but doesn't allow configuration of selection among the groups that may appear within it. (I admit there aren't many occasions when suppression of substrings from group(0) is really necessary, but I think they do occur.)