(original) (raw)

Oh. Yes, that is being discussed about once a year two. It seems Matthew isn't very interested in helping out with the port, and there are some concerns about backwards compatibility with the \`re\` module. I think it needs a champion!

On Fri, Oct 27, 2017 at 8:50 AM, Tim Peters <tim.peters@gmail.com> wrote:

Note that Matthew Barnett's \`regex\` module already supports \\G, and a
great many other features that weren't around 15 years ago ;-) either:

https://pypi.python.org/pypi/regex/

I haven't followed this in detail. I'm just surprised once per year
that it hasn't been folded into the core ;-)

\[nothing new below\]

On Fri, Oct 27, 2017 at 10:35 AM, Guido van Rossum <guido@python.org> wrote:
\> The "why" question is not very interesting -- it probably wasn't in PCRE and
\> nobody was familiar with it when we moved off PCRE (maybe it wasn't even in
\> Perl at the time -- it was \~15 years ago).
\>
\> I didn't understand your description of \\G so I googled it and found a
\> helpful StackOverflow article:
\> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex.
\> From this I understand that when using e.g. findall() it forces successive
\> matches to be adjacent.
\>
\> In general this seems to be a unique property of \\G: it preserves \*state\*
\> from one match to the next. This will make it somewhat difficult to
\> implement -- e.g. that state should probably be thread-local in case
\> multiple threads use the same compiled regex. It's also unclear when that
\> state should be reset. (Only when you compile the regex? Each time you pass
\> it a different source string?)
\>
\> So I'm not sure it's reasonable to add. But I also don't see a reason why it
\> shouldn't be added -- presuming we can decide on good answer for the
\> questions above about the "scope" of the anchor.
\>
\> I think it's okay to start a discussion on bugs.python.org about the precise
\> specification of \\G for Python. OTOH I expect that most core devs won't find
\> this a very interesting problem (Python relies on regexes for parsing a lot
\> less than Perl does).
\>
\> Good luck!
\>
\> On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22@gmail.com> wrote:
\>>
\>> All,
\>>
\>> perl has a regex assertion (\\G) that allows multiple-match regular
\>> expressions to be able to use the position of the last match. Perl's
\>> documentation puts it this way:
\>>
\>> \\G Match only at pos() (e.g. at the end-of-match position of prior
\>> m//g)
\>>
\>> Anyways, this is exceedingly powerful for matching regularly
\>> structured free-form records, and I was really surprised when I found
\>> out that python did not have it. For example, if findall supported
\>> this, it would be possible to write things like this (a quick and
\>> dirty ifconfig parser):
\>>
\>> pat = re.compile(r'\\G(\\S+)(.\*?\\n)(?=\\S+|\\Z)', re.S)
\>>
\>> val = """
\>> eth2 Link encap:Ethernet HWaddr xx
\>> inet addr: xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx
\>> ...
\>> lo Link encap:Local Loopback
\>> inet addr:127.0.0.1 Mask:255.0.0.0
\>> """
\>> matches = re.findall(pat, val)
\>>
\>> So - why doesn't python have this? is it something that simply was
\>> overlooked, or is there another method of doing the same thing with
\>> arbitrarily complex freeform records?
\>>
\>> thanks much..
\>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>> Python-Dev mailing list
\>> Python-Dev@python.org
\>> https://mail.python.org/mailman/listinfo/python-dev
\>> Unsubscribe:
\>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
\>
\>
\>
\>
\> --
\> --Guido van Rossum (python.org/\~guido)
\>
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> Python-Dev mailing list
\> Python-Dev@python.org
\> https://mail.python.org/mailman/listinfo/python-dev
\> Unsubscribe:

> https://mail.python.org/mailman/options/python-dev/tim.peters%40gmail.com
\>

--Guido van Rossum (python.org/\~guido)