[Python-Dev] re performance

Serhiy Storchaka storchaka at gmail.com
Sun Jan 29 16:08:20 EST 2017


On 29.01.17 12:18, Jakub Wilk wrote:
> * Armin Rigo <armin.rigoatgmail.com>, 2017-01-28, 12:44:
>> The theoretical kind of regexp is about giving a "yes/no" answer,
>> whereas the concrete "re" or "regexp" modules gives a match object,
>> which lets you ask for the subgroups' location, for example. Strange
>> at it may seem, I am not aware of a way to do that using the
>> linear-time approach of the theory---if it answers "yes", then you
>> have no way of knowing *where* the subgroups matched.
>>
>> Another issue is that the theoretical engine has no notion of
>> greedy/non-greedy matching.
>
> RE2 has linear execution time, and it supports both capture groups and
> greedy/non-greedy matching.
>
> The implementation is explained in this article:
> https://swtch.com/~rsc/regexp/regexp3.html

Not all features of Python regular expressions can be implemented with 
linear complexity. It is possible to compile the part of regular 
expressions to the implementation with linear complexity. Patches are 
welcome.



More information about the Python-Dev mailing list