[Python-Dev] \G (match last position) regex operator non-existant in python?

Tim Peters tim.peters at gmail.com
Fri Oct 27 11:50:48 EDT 2017


Note that Matthew Barnett's `regex` module already supports \G, and a
great many other features that weren't around 15 years ago ;-) either:

    https://pypi.python.org/pypi/regex/

I haven't followed this in detail.  I'm just surprised once per year
that it hasn't been folded into the core ;-)

[nothing new below]

On Fri, Oct 27, 2017 at 10:35 AM, Guido van Rossum <guido at python.org> wrote:
> The "why" question is not very interesting -- it probably wasn't in PCRE and
> nobody was familiar with it when we moved off PCRE (maybe it wasn't even in
> Perl at the time -- it was ~15 years ago).
>
> I didn't understand your description of \G so I googled it and found a
> helpful StackOverflow article:
> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex.
> From this I understand that when using e.g. findall() it forces successive
> matches to be adjacent.
>
> In general this seems to be a unique property of \G: it preserves *state*
> from one match to the next. This will make it somewhat difficult to
> implement -- e.g. that state should probably be thread-local in case
> multiple threads use the same compiled regex. It's also unclear when that
> state should be reset. (Only when you compile the regex? Each time you pass
> it a different source string?)
>
> So I'm not sure it's reasonable to add. But I also don't see a reason why it
> shouldn't be added -- presuming we can decide on good answer for the
> questions above about the "scope" of the anchor.
>
> I think it's okay to start a discussion on bugs.python.org about the precise
> specification of \G for Python. OTOH I expect that most core devs won't find
> this a very interesting problem (Python relies on regexes for parsing a lot
> less than Perl does).
>
> Good luck!
>
> On Thu, Oct 26, 2017 at 11:03 PM, Ed Peschko <horos22 at gmail.com> wrote:
>>
>> All,
>>
>> perl has a regex assertion (\G) that allows multiple-match regular
>> expressions to be able to use the position of the last match. Perl's
>> documentation puts it this way:
>>
>>     \G Match only at pos() (e.g. at the end-of-match position of prior
>> m//g)
>>
>> Anyways, this is exceedingly powerful for matching regularly
>> structured free-form records, and I was really surprised when I found
>> out that python did not have it. For example, if findall supported
>> this, it would be possible to write things like this (a quick and
>> dirty ifconfig parser):
>>
>> pat = re.compile(r'\G(\S+)(.*?\n)(?=\S+|\Z)', re.S)
>>
>> val = """
>> eth2      Link encap:Ethernet  HWaddr xx
>>              inet addr: xx.xx.xx.xx  Bcast:xx.xx.xx.xx  Mask:xx.xx.xx.xx
>> ...
>> lo        Link encap:Local Loopback
>>            inet addr:127.0.0.1  Mask:255.0.0.0
>> """
>>  matches = re.findall(pat, val)
>>
>> So - why doesn't python have this? is it something that simply was
>> overlooked, or is there another method of doing the same thing with
>> arbitrarily complex freeform records?
>>
>> thanks much..
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/tim.peters%40gmail.com
>


More information about the Python-Dev mailing list