[Python-Dev] \G (match last position) regex operator non-existant in python? (original) (raw)

MRAB python at mrabarnett.plus.com
Sun Oct 29 12:54:09 EDT 2017


On 2017-10-29 12:27, Serhiy Storchaka wrote:

27.10.17 18:35, Guido van Rossum пише:

The "why" question is not very interesting -- it probably wasn't in PCRE and nobody was familiar with it when we moved off PCRE (maybe it wasn't even in Perl at the time -- it was ~15 years ago).

I didn't understand your description of \G so I googled it and found a helpful StackOverflow article: https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. From this I understand that when using e.g. findall() it forces successive matches to be adjacent. This looks too Perlish to me. In Perl regular expressions are the part of language syntax, they can contain even Perl expressions. Arguments to them are passed implicitly (as well as to Perl's analogs of str.strip() and str.split()) and results are saved in global special variables. Loops also can be implicit. It seems to me that \G makes sense only to re.findall() and re.finditer(), not to re.match(), re.search() or re.split(). In Python all this is explicit. Compiled regular expressions are objects, and you can pass start and end positions to Pattern.match(). The Python equivalent of \G looks to me like: p = re.compile(...) i = 0 while True: m = p.match(s, i) if not m: break ... i = m.end() You're correct. \G matches at the start position, so .search(r\G\w+') behaves the same as .match(r'\w+').

findall and finditer perform a series of searches, but with \G at the start they'll perform a series of matches, each anchored at where the previous one ended.

The one also can use the undocumented Pattern.scanner() method. Actually Pattern.finditer() is implemented as iter(Pattern.scanner().search). iter(Pattern.scanner().match) would return an iterator of adjacent matches.

I think it would be more Pythonic (and much easier) to add a boolean parameter to finditer() and findall() than introduce a \G operator.



More information about the Python-Dev mailing list