regular expressions ... slow

Terry Reedy tjreedy at udel.edu
Mon Nov 17 17:13:19 EST 2008


Uwe Schmitt wrote:
> Hi,
> 
> Is anobody aware of this post:  http://swtch.com/~rsc/regexp/regexp1.html
> ?

Near the end:

While writing the text editor sam [6] in the early 1980s, Rob Pike wrote 
a new regular expression implementation, which Dave Presotto extracted 
into a library that appeared in the Eighth Edition. Pike's 
implementation incorporated submatch tracking into an efficient NFA 
simulation but, like the rest of the Eighth Edition source, was not 
widely distributed. Pike himself did not realize that his technique was 
anything new. Henry Spencer reimplemented the Eighth Edition library 
interface from scratch, but using backtracking, and released his 
implementation into the public domain. It became very widely used, 
eventually serving as the basis for the slow regular expression 
implementations mentioned earlier: Perl, PCRE, Python, and so on. (In 
his defense, Spencer knew the routines could be slow, and he didn't know 
that a more efficient algorithm existed. He even warned in the 
documentation, “Many users have found the speed perfectly adequate, 
although replacing the insides of egrep with this code would be a 
mistake.”) Pike's regular expression implementation, extended to support 
Unicode, was made freely available with sam in late 1992, but the 
particularly efficient regular expression search algorithm went 
unnoticed. The code is now available in many forms: as part of sam, as 
Plan 9's regular expression library, or packaged separately for Unix. 
Ville Laurikari independently discovered Pike's algorithm in 1999, 
developing a theoretical foundation as well [2].

So, depending on the license, there appears to be a potential basis for 
a Python unicode version.


> Are there any plans  to speed up Pythons regular expression module ?

Yes, but I don't know how much people with such plans have considered 
the adaptive approach recommended.

> is the example in this artricle too far from reality ???

The example is, but I don't think the problem illustrated is.

tjr







More information about the Python-list mailing list