Speeding up multiple regex matches

Fredrik Lundh fredrik at pythonware.com
Fri Nov 18 13:09:49 EST 2005


"Talin" wrote:

> I've run in to this problem a couple of times. Say I have a piece of
> text that I want to test against a large number of regular expressions,
> where a different action is taken based on which regex successfully
> matched. The naive approach is to loop through each regex, and stop
> when one succeeds. However, I am finding this to be too slow for my
> application -- currently 30% of the run time is being taken up in the
> regex matching.
>
> I thought of a couple of approaches, but I am not sure how to make them
> work:
>
> 1) Combine all of the regular expressions into one massive regex, and
> let the regex state machine do all the discriminating. The problem with
> this is that it gives you no way to determine which regex was the
> matching one.

use a capturing group for each alternative, and use lastindex to quickly
find the match:

    http://docs.python.org/lib/match-objects.html

    lastindex

    The integer index of the last matched capturing group, or None if
    no group was matched at all.

also see:

    http://effbot.org/zone/xml-scanner.htm

</F>






More information about the Python-list mailing list