issue2636-24 : Code : Python (original) (raw)
lp:~pythonregexp2.7/python/issue2636-24
Created byTimeHorse on 2008-09-24 and last modifiedon 2008-09-24
Currently, the python Regular Expression Engine drops characters when used findall / finditer with an expression that has a Zero-Width capture group. For example:
>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]
The 'a' has been lost because the engine first matches the (^z*) with zero-width and then consumes the current character (the 'a'). It then proceeds to match the rest of the expression, which it does with (\w+), resulting in 'bc'. The problem is that firstly, the 'a' should not be consumed by the zero-width match (^z*). But, that would lead to infinite matches of zero-width. So, secondly, one would have to give each iteration an internal state that would indicate whether the it would allow a Zero-width match. Initially, any string will match a Zero-Width expression once, but when that same position is entered, the 'Zero-width match' flag would be true and a subsequent Zero-width match would be disallowed. This item is based on the work from Issue 1647489.
Get this branch:
bzr branchlp:~pythonregexp2.7/python/issue2636-24
Branch merges
Related bugs
Related blueprints
Branch information
Recent revisions
39039. ByJeffrey C. "The TimeHorse" Jacobs on 2008-09-21
39038. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-18
39037. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-11
39036. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-09
39035. ByJeffrey C. "The TimeHorse" Jacobs on 2008-06-03
39034. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-30
39033. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-29
39032. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-29
39031. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-24
39030. ByJeffrey C. "The TimeHorse" Jacobs on 2008-05-22
Branch metadata
Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 with rich root (needs bzr 1.0)