[Python-Dev] \G (match last position) regex operator non-existant in python? (original) (raw)

Guido van Rossum guido at python.org
Fri Oct 27 11:57:57 EDT 2017


Oh. Yes, that is being discussed about once a year two. It seems Matthew isn't very interested in helping out with the port, and there are some concerns about backwards compatibility with the re module. I think it needs a champion!

On Fri, Oct 27, 2017 at 8:50 AM, Tim Peters <tim.peters at gmail.com> wrote:

Note that Matthew Barnett's regex module already supports \G, and a great many other features that weren't around 15 years ago ;-) either:

https://pypi.python.org/pypi/regex/ I haven't followed this in detail. I'm just surprised once per year that it hasn't been folded into the core ;-) [nothing new below] On Fri, Oct 27, 2017 at 10:35 AM, Guido van Rossum <guido at python.org> wrote: > The "why" question is not very interesting -- it probably wasn't in PCRE and > nobody was familiar with it when we moved off PCRE (maybe it wasn't even in > Perl at the time -- it was ~15 years ago). > > I didn't understand your description of \G so I googled it and found a > helpful StackOverflow article: > https://stackoverflow.com/questions/21971701/when-is-g- useful-application-in-a-regex. > From this I understand that when using e.g. findall() it forces successive > matches to be adjacent. > > In general this seems to be a unique property of \G: it preserves state > from one match to the next. This will make it somewhat difficult to > implement -- e.g. that state should probably be thread-local in case > multiple threads use the same compiled regex. It's also unclear when that > state should be reset. (Only when you compile the regex? Each time you pass > it a different source string?) > > So I'm not sure it's reasonable to add. But I also don't see a reason why it > shouldn't be added -- presuming we can decide on good answer for the > questions above about the "scope" of the anchor. > > I think it's okay to start a discussion on bugs.python.org about the precise > specification of \G for Python. OTOH I expect that most core devs won't find > this a very interesting problem (Python relies on regexes for parsing a lot > less than Perl does). > > Good luck! > > On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote: >> >> All, >> >> perl has a regex assertion (\G) that allows multiple-match regular >> expressions to be able to use the position of the last match. Perl's >> documentation puts it this way: >> >> \G Match only at pos() (e.g. at the end-of-match position of prior >> m//g) >> >> Anyways, this is exceedingly powerful for matching regularly >> structured free-form records, and I was really surprised when I found >> out that python did not have it. For example, if findall supported >> this, it would be possible to write things like this (a quick and >> dirty ifconfig parser): >> >> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S) >> >> val = """ >> eth2 Link encap:Ethernet HWaddr xx >> inet addr: xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:xx.xx.xx.xx >> ... >> lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> """ >> matches = re.findall(pat, val) >> >> So - why doesn't python have this? is it something that simply was >> overlooked, or is there another method of doing the same thing with >> arbitrarily complex freeform records? >> >> thanks much.. _>> ________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) > _> ________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/ tim.peters%40gmail.com >

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20171027/d3345fe7/attachment-0001.html>



More information about the Python-Dev mailing list