regular expressions ... slow

MRAB google at mrabarnett.plus.com
Mon Nov 17 18:38:48 EST 2008


On Nov 17, 10:24 pm, Terry Reedy <tjre... at udel.edu> wrote:
> Jerry Hill wrote:
> > On Mon, Nov 17, 2008 at 4:37 PM, Uwe Schmitt
> > <rocksportroc... at googlemail.com> wrote:
> >> Hi,
>
> >> Is anobody aware of this post:  http://swtch.com/~rsc/regexp/regexp1.html?
>
> > Yes, it's been brought up here, on python-dev and python-ideas several
> > times in the past year and a half.
>
> >> Are there any plans  to speed up Pythons regular expression module ?
> >> Or
> >> is the example in this artricle too far from reality ???
>
> > I don't think anyone has taken any concrete steps towards re-writing
> > the regular expression module.  My understanding from previous threads
> > on the topic is that the core developers would be willing to accept a
> > re-written regular expression engine, but none of them are interested
> > in doing it themselves.  The general consensus seemed to be that the
> > pathological cases hilited in that article are not very common in the
> > real world, and that simply switching to the alternative approach
> > advocated there would require giving up things like backreferences
> > that are actually used in the real world, which is probably
> > unacceptable.
>
> > Some references:
> >http://mail.python.org/pipermail/python-dev/2007-March/072241.html
> >http://mail.python.org/pipermail/python-list/2007-February/427604.html
> >http://mail.python.org/pipermail/python-ideas/2007-April/000405.html
>
> > Personally, I know very little about the nitty gritty of regular
> > expression engines, but there's some reference material for you to
> > chew on.
>
> Searching the tracker for open items with 'regular expression' in the
> text brings up about 20 items to also consider.

Work is currently being done on the re module.

I don't think the DFA approach works permits backreferences, capture
groups or non-greedy repetition, but it certainly could be used if
those features aren't required by the regular expression, so the
answer is definitely maybe! :-)



More information about the Python-list mailing list