Regular Expressions, Speed, Python, and NFA

breamoreboy at gmail.com breamoreboy at gmail.com
Sat Apr 15 10:54:55 EDT 2017


On Friday, April 14, 2017 at 4:12:27 PM UTC+1, Malik Rumi wrote:
> I am running some tests using the site regex101 to figure out the correct regexs to use for a project. I was surprised at how slow it was, constantly needing to increase the timeouts. I went Googling for a reason, and solution, and found Russ Cox’s article from 2007: https://swtch.com/~rsc/regexp/regexp1.html . I couldn’t understand why, if this was even remotely correct, we don’t use NFA in Python, which led me here:
> 
> https://groups.google.com/forum/#!msg/comp.lang.python/L1ZFI_R2hAo/C12Nf3patWIJ;context-place=forum/comp.lang.python where all of these issues were addressed. Unfortunately, this is also from 2007. 
> 
> BTW, John Machin in one of his replies cites Navarro’s paper, but that link is broken. Navarro’s work can now be found at http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3112&rep=rep1&type=pdf But be forewarned, it is 68 pages of dense reading. I am not a computer science major. I am not new to Python, but I don’t think I’m qualified to take on the idea of creating a new NFA module for Python.  
> 
> I am not a computer science major. I am not new to Python, but I don’t think I’m qualified to take on the idea of creating a new NFA module for Python.  Nor am I entirely sure I want to try something new (to me) like TRE. 
> 
> Most threads related to this topic are older than 2007. I did find this https://groups.google.com/forum/#!searchin/comp.lang.python/regex$20speed%7Csort:relevance/comp.lang.python/O7rUwVoD2t0/NYAQM0mUX7sJ from 2011 but I did not do an exhaustive search. 
> 
> The bottom line is I wanted to know if anything has changed since 2007, and if there is a) any hope for improving regex speeds in Python, or b) some 3rd party module/library that is already out there and solves this problem? Or should I just take this advice?
> 
> 
> Thanks.

Check out https://pypi.python.org/pypi/regex and for a little light background reading please see http://bugs.python.org/issue2636

Kindest regards.

Mark Lawrence.



More information about the Python-list mailing list