[Python-Dev] \G (match last position) regex operator non-existant in python? (original) (raw)

Ed Peschko horos22 at gmail.com
Fri Oct 27 13:05:28 EDT 2017


From this I understand that when using e.g. findall() it forces successive matches to be adjacent.

yes, I admit that this is a clearer description of what \G does. My only defense is that I wrote my description when it was late. :)

I can only stress how useful it is, especially for debugging regexes. Basically if you are cutting up any string into discrete chunks, you want to make sure that you aren't missing any chunks in the middle when you do the cut.

without \G, you can miss large sections of string, and it is easy to overlook. with \G, you are guaranteed to see exactly where your regex falls down. In addition, there are specific regexes that you can only write with \G (eg. c parsers)

Anyways, I'll look at regex.

On Fri, Oct 27, 2017 at 8:35 AM, Guido van Rossum <guido at python.org> wrote:

The "why" question is not very interesting -- it probably wasn't in PCRE and nobody was familiar with it when we moved off PCRE (maybe it wasn't even in Perl at the time -- it was ~15 years ago).

I didn't understand your description of \G so I googled it and found a helpful StackOverflow article: https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. From this I understand that when using e.g. findall() it forces successive matches to be adjacent. In general this seems to be a unique property of \G: it preserves state from one match to the next. This will make it somewhat difficult to implement -- e.g. that state should probably be thread-local in case multiple threads use the same compiled regex. It's also unclear when that state should be reset. (Only when you compile the regex? Each time you pass it a different source string?) So I'm not sure it's reasonable to add. But I also don't see a reason why it shouldn't be added -- presuming we can decide on good answer for the questions above about the "scope" of the anchor. I think it's okay to start a discussion on bugs.python.org about the precise specification of \G for Python. OTOH I expect that most core devs won't find this a very interesting problem (Python relies on regexes for parsing a lot less than Perl does). Good luck! On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:

All, perl has a regex assertion (\G) that allows multiple-match regular expressions to be able to use the position of the last match. Perl's documentation puts it this way: \G Match only at pos() (e.g. at the end-of-match position of prior m//g) Anyways, this is exceedingly powerful for matching regularly structured free-form records, and I was really surprised when I found out that python did not have it. For example, if findall supported this, it would be possible to write things like this (a quick and dirty ifconfig parser): pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S) val = """ eth2 Link encap:Ethernet HWaddr xx inet addr: xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx ... lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 """ matches = re.findall(pat, val) So - why doesn't python have this? is it something that simply was overlooked, or is there another method of doing the same thing with arbitrarily complex freeform records? thanks much..


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list