[Python-bugs-list] [ python-Bugs-448951 ] Bug in re group handling (original) (raw)

noreply@sourceforge.net noreply@sourceforge.net
Thu, 04 Oct 2001 21:39:24 -0700


Bugs item #448951, was opened at 2001-08-07 17:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470

Category: Regular Expressions Group: Python 2.1.1 Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Fredrik Lundh (effbot) Summary: Bug in re group handling

Initial Comment: #

read it or run it!

import re,sys print sys.version #

Bug in 're' lib in Python 2.1

Consider this regexp : (?:([0-3]):)?0#

This will match one of

'0#', '0:0#', '1:0#', '2:0#', '3:0#'

The matching itself works fine, but group(1) should

be None for the '0#' case, and 'x' for the 'x:0#' cases.

For '0#', the optional '([0-3]):' part of the

r.e. (enclosed in (?: )) does not match anything, and that

is what contains group 1.

The actual result is, group(1) is '0' for both '0#' and '0:0#'.

Likely this happens because when '0' is seen, the state machine

cannot not yet determine whether the ([0-3]): should be matched,

but has already seen enough of it to know what group(1) is, assuming

it does match. The match needs to be deleted once the containing

? fails. Indeed, if the group is expanded to include the ':',

as in '(?:([0-3]:))?0#', or just '([0-3]:)?0#', '0#' produces

group(1)=None as it should.

Also, this is a good time to point out an error in the

docs. The docs say that group(n) returns -1 when the

group is in an unmatched part the of the r.e.; actually

it returns None, which is more sensible.

rexp = '(?:([0-3]):)?0#' mat1 = re.compile(rexp)

print "Re = ", rexp

for str in [ '2:0#', '0:0#', '0#', '0:#', ':0#']: print "\n-----<<", str, ">>-----" mat = mat1.match(str) if mat: print " group(0) = ", mat.group(0) print " group(1) = ", mat.group(1) else: print " no match" #

output is below

#################################

Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32

Re = (?:([0-3]):)?0#

-----<< 2:0# >>-----

group(0) = 2:0#

group(1) = 2

-----<< 0:0# >>-----

group(0) = 0:0#

group(1) = 0

-----<< 0# >>-----

group(0) = 0#

group(1) = 0

-----<< 0:# >>-----

no match

-----<< :0# >>-----

no match

############################################


Comment By: Matthew Mueller (donut) Date: 2001-10-04 21:39

Message: Logged In: YES user_id=65253

I posted a fix as patch #468169 since I don't seem to have access to add it here.


Comment By: Gregory Smith (gregsmith) Date: 2001-08-30 09:30

Message: Logged In: YES user_id=292741

This appears to be the same bug as #429357, albeit using a simpler test case. I have added a comment to that one.


You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=448951&group_id=5470