issue2636-24 : Code : Python (original) (raw)

lp:~pythonregexp2.7/python/issue2636-24

Created byTimeHorse on 2008-09-24 and last modifiedon 2008-09-24

Currently, the python Regular Expression Engine drops characters when used findall / finditer with an expression that has a Zero-Width capture group. For example:

>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]

The 'a' has been lost because the engine first matches the (^z*) with zero-width and then consumes the current character (the 'a'). It then proceeds to match the rest of the expression, which it does with (\w+), resulting in 'bc'. The problem is that firstly, the 'a' should not be consumed by the zero-width match (^z*). But, that would lead to infinite matches of zero-width. So, secondly, one would have to give each iteration an internal state that would indicate whether the it would allow a Zero-width match. Initially, any string will match a Zero-Width expression once, but when that same position is entered, the 'Zero-width match' flag would be true and a subsequent Zero-width match would be disallowed. This item is based on the work from Issue 1647489.

Get this branch:

bzr branchlp:~pythonregexp2.7/python/issue2636-24

Branch merges

Branch information

Recent revisions

39039. ByJeffrey C. "The TimeHorse" Jacobs on 2008-09-21

39038. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-18

39037. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-11

39036. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-09

39035. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-03

39034. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-30

39033. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-29

39032. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-29

39031. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-24

39030. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-22

Branch metadata

Branch format:

Branch format 6

Repository format:

Bazaar pack repository format 1 with rich root (needs bzr 1.0)