[Python-Dev] \G (match last position) regex operator non-existant in python?

Guido van Rossum guido at python.org
Fri Oct 27 11:57:57 EDT 2017


Oh. Yes, that is being discussed about once a year two. It seems Matthew
isn't very interested in helping out with the port, and there are some
concerns about backwards compatibility with the `re` module. I think it
needs a champion!

On Fri, Oct 27, 2017 at 8:50 AM, Tim Peters <tim.peters at gmail.com> wrote:

> Note that Matthew Barnett's `regex` module already supports \G, and a
> great many other features that weren't around 15 years ago ;-) either:
>
>     https://pypi.python.org/pypi/regex/
>
> I haven't followed this in detail.  I'm just surprised once per year
> that it hasn't been folded into the core ;-)
>
> [nothing new below]
>
> On Fri, Oct 27, 2017 at 10:35 AM, Guido van Rossum <guido at python.org>
> wrote:
> > The "why" question is not very interesting -- it probably wasn't in PCRE
> and
> > nobody was familiar with it when we moved off PCRE (maybe it wasn't even
> in
> > Perl at the time -- it was ~15 years ago).
> >
> > I didn't understand your description of \G so I googled it and found a
> > helpful StackOverflow article:
> > https://stackoverflow.com/questions/21971701/when-is-g-
> useful-application-in-a-regex.
> > From this I understand that when using e.g. findall() it forces
> successive
> > matches to be adjacent.
> >
> > In general this seems to be a unique property of \G: it preserves *state*
> > from one match to the next. This will make it somewhat difficult to
> > implement -- e.g. that state should probably be thread-local in case
> > multiple threads use the same compiled regex. It's also unclear when that
> > state should be reset. (Only when you compile the regex? Each time you
> pass
> > it a different source string?)
> >
> > So I'm not sure it's reasonable to add. But I also don't see a reason
> why it
> > shouldn't be added -- presuming we can decide on good answer for the
> > questions above about the "scope" of the anchor.
> >
> > I think it's okay to start a discussion on bugs.python.org about the
> precise
> > specification of \G for Python. OTOH I expect that most core devs won't
> find
> > this a very interesting problem (Python relies on regexes for parsing a
> lot
> > less than Perl does).
> >
> > Good luck!
> >
> > On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:
> >>
> >> All,
> >>
> >> perl has a regex assertion (\G) that allows multiple-match regular
> >> expressions to be able to use the position of the last match. Perl's
> >> documentation puts it this way:
> >>
> >>     \G Match only at pos() (e.g. at the end-of-match position of prior
> >> m//g)
> >>
> >> Anyways, this is exceedingly powerful for matching regularly
> >> structured free-form records, and I was really surprised when I found
> >> out that python did not have it. For example, if findall supported
> >> this, it would be possible to write things like this (a quick and
> >> dirty ifconfig parser):
> >>
> >> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S)
> >>
> >> val = """
> >> eth2      Link encap:Ethernet  HWaddr xx
> >>              inet addr: xx.xx.xx.xx  Bcast:xx.xx.xx.xx  Mask:xx.xx.xx.xx
> >> ...
> >> lo        Link encap:Local Loopback
> >>            inet addr:127.0.0.1  Mask:255.0.0.0
> >> """
> >>  matches = re.findall(pat, val)
> >>
> >> So - why doesn't python have this? is it something that simply was
> >> overlooked, or is there another method of doing the same thing with
> >> arbitrarily complex freeform records?
> >>
> >> thanks much..
> >> _______________________________________________
> >> Python-Dev mailing list
> >> Python-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe:
> >> https://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> >
> >
> >
> > --
> > --Guido van Rossum (python.org/~guido)
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> > https://mail.python.org/mailman/options/python-dev/
> tim.peters%40gmail.com
> >
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171027/d3345fe7/attachment-0001.html>


More information about the Python-Dev mailing list