Issue 725106: SRE bug with capturing groups in alternatives in repeats (original) (raw)

Issue725106

Created on 2003-04-21 17:16 by glchapman, last changed 2022-04-10 16:08 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
rep_alts_patch.txt glchapman,2003-04-21 17:16
Messages (3)
msg15562 - (view) Author: Greg Chapman (glchapman) Date: 2003-04-21 17:16
SRE does not always correctly handle groups in alternatives in repeats. For example: >>> re.match('((a)|b)*', 'abc').groups() ('b', '') Group 2 should obviously never be an empty string. As I understand it, the rule for groups inside a repeat is that they should have the last value they matched during the iterations of the repeat (or None if they never match), so in the above case Group 2 should be 'a'. To fix this, it appears that (when inside a repeat) the BRANCH opcode must call mark_save before trying an alternative and then call mark_restore if the alternative fails. The attached patch does this.
msg15563 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-04-27 12:35
Logged In: YES user_id=7887 Good catch Greg! Just for reference, here are two tests to confirm that you're right: perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";' echo "abc" sed -r -e "s/^((a) b)*/\1 \2
msg15564 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-04-27 14:26
Logged In: YES user_id=7887 Greg, I'm going to change the fix slightly, moving the mark_save() to outside of the for loop.
History
Date User Action Args
2022-04-10 16:08:16 admin set github: 38344
2003-04-21 17:16:52 glchapman create