[Python-Dev] \G (match last position) regex operator non-existant in python?

MRAB python at mrabarnett.plus.com
Sun Oct 29 12:54:09 EDT 2017


On 2017-10-29 12:27, Serhiy Storchaka wrote:
> 27.10.17 18:35, Guido van Rossum пише:
>> The "why" question is not very interesting -- it probably wasn't in PCRE 
>> and nobody was familiar with it when we moved off PCRE (maybe it wasn't 
>> even in Perl at the time -- it was ~15 years ago).
>> 
>> I didn't understand your description of \G so I googled it and found a 
>> helpful StackOverflow article: 
>> https://stackoverflow.com/questions/21971701/when-is-g-useful-application-in-a-regex. 
>>  From this I understand that when using e.g. findall() it forces 
>> successive matches to be adjacent.
> 
> This looks too Perlish to me. In Perl regular expressions are the part
> of language syntax, they can contain even Perl expressions. Arguments to
> them are passed implicitly (as well as to Perl's analogs of str.strip()
> and str.split()) and results are saved in global special variables.
> Loops also can be implicit.
> 
> It seems to me that \G makes sense only to re.findall() and
> re.finditer(), not to re.match(), re.search() or re.split().
> 
> In Python all this is explicit. Compiled regular expressions are
> objects, and you can pass start and end positions to Pattern.match().
> The Python equivalent of \G looks to me like:
> 
> p = re.compile(...)
> i = 0
> while True:
>       m = p.match(s, i)
>       if not m: break
>       ...
>       i = m.end()
> 
> 
You're correct. \G matches at the start position, so .search(r\G\w+') 
behaves the same as .match(r'\w+').

findall and finditer perform a series of searches, but with \G at the 
start they'll perform a series of matches, each anchored at where the 
previous one ended.

> The one also can use the undocumented Pattern.scanner() method. Actually
> Pattern.finditer() is implemented as iter(Pattern.scanner().search).
> iter(Pattern.scanner().match) would return an iterator of adjacent matches.
> 
> I think it would be more Pythonic (and much easier) to add a boolean
> parameter to finditer() and findall() than introduce a \G operator.
> 


More information about the Python-Dev mailing list