Speeding up multiple regex matches

Alex Martelli aleax at mail.comcast.net
Fri Nov 18 12:11:10 EST 2005


Talin <viridia at gmail.com> wrote:
   ...
> 1) Combine all of the regular expressions into one massive regex, and
> let the regex state machine do all the discriminating. The problem with
> this is that it gives you no way to determine which regex was the
> matching one.

Place each regex into a parenthesized group, and check which groups have
matched on the resulting matchobject:

>>> x=re.compile('(aa)|(bb)')
>>> mo=x.search('zaap!')
>>> mo.groups()
('aa', None)

There's a limit of 99 groups, so if you have unbounded number of regexes
to start with you'll have to split them up 99-or-fewer at a time, but
that shouldn't be impossibly hard.


Alex



More information about the Python-list mailing list