[Python-Dev] re performance

Jakub Wilk jwilk at jwilk.net
Sun Jan 29 05:18:23 EST 2017


* Armin Rigo <armin.rigoatgmail.com>, 2017-01-28, 12:44:
>The theoretical kind of regexp is about giving a "yes/no" answer, whereas the 
>concrete "re" or "regexp" modules gives a match object, which lets you ask for 
>the subgroups' location, for example. Strange at it may seem, I am not aware 
>of a way to do that using the linear-time approach of the theory---if it 
>answers "yes", then you have no way of knowing *where* the subgroups matched.
>
>Another issue is that the theoretical engine has no notion of 
>greedy/non-greedy matching.

RE2 has linear execution time, and it supports both capture groups and 
greedy/non-greedy matching.

The implementation is explained in this article:
https://swtch.com/~rsc/regexp/regexp3.html

-- 
Jakub Wilk


More information about the Python-Dev mailing list