[Python-bugs-list] [ python-Bugs-448951 ] Bug in re group handling (original) (raw)
noreply@sourceforge.net noreply@sourceforge.net
Thu, 04 Oct 2001 21:39:24 -0700
- Previous message: [Python-bugs-list] [ python-Bugs-429357 ] non-greedy regexp duplicating match bug
- Next message: [Python-bugs-list] [ python-Bugs-210665 ] Compiling python on hpux 11.00 with threads (PR#360)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Bugs item #448951, was opened at 2001-08-07 17:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470
Category: Regular Expressions Group: Python 2.1.1 Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Fredrik Lundh (effbot) Summary: Bug in re group handling
Initial Comment: #
read it or run it!
import re,sys print sys.version #
Bug in 're' lib in Python 2.1
Consider this regexp : (?:([0-3]):)?0#
This will match one of
'0#', '0:0#', '1:0#', '2:0#', '3:0#'
The matching itself works fine, but group(1) should
be None for the '0#' case, and 'x' for the 'x:0#' cases.
For '0#', the optional '([0-3]):' part of the
r.e. (enclosed in (?: )) does not match anything, and that
is what contains group 1.
The actual result is, group(1) is '0' for both '0#' and '0:0#'.
Likely this happens because when '0' is seen, the state machine
cannot not yet determine whether the ([0-3]): should be matched,
but has already seen enough of it to know what group(1) is, assuming
it does match. The match needs to be deleted once the containing
? fails. Indeed, if the group is expanded to include the ':',
as in '(?:([0-3]:))?0#', or just '([0-3]:)?0#', '0#' produces
group(1)=None as it should.
Also, this is a good time to point out an error in the
docs. The docs say that group(n) returns -1 when the
group is in an unmatched part the of the r.e.; actually
it returns None, which is more sensible.
rexp = '(?:([0-3]):)?0#' mat1 = re.compile(rexp)
print "Re = ", rexp
for str in [ '2:0#', '0:0#', '0#', '0:#', ':0#']: print "\n-----<<", str, ">>-----" mat = mat1.match(str) if mat: print " group(0) = ", mat.group(0) print " group(1) = ", mat.group(1) else: print " no match" #
output is below
#################################
Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Re = (?:([0-3]):)?0#
-----<< 2:0# >>-----
group(0) = 2:0#
group(1) = 2
-----<< 0:0# >>-----
group(0) = 0:0#
group(1) = 0
-----<< 0# >>-----
group(0) = 0#
group(1) = 0
-----<< 0:# >>-----
no match
-----<< :0# >>-----
no match
############################################
Comment By: Matthew Mueller (donut) Date: 2001-10-04 21:39
Message: Logged In: YES user_id=65253
I posted a fix as patch #468169 since I don't seem to have access to add it here.
Comment By: Gregory Smith (gregsmith) Date: 2001-08-30 09:30
Message: Logged In: YES user_id=292741
This appears to be the same bug as #429357, albeit using a simpler test case. I have added a comment to that one.
You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470
- Previous message: [Python-bugs-list] [ python-Bugs-429357 ] non-greedy regexp duplicating match bug
- Next message: [Python-bugs-list] [ python-Bugs-210665 ] Compiling python on hpux 11.00 with threads (PR#360)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]