Regexp optimization question

Fri Apr 23 11:47:09 EDT 2004

In article <c69b42$9f6eq$1 at ID-99293.news.uni-berlin.de>, William Park wrote:
>Magnus Lie Hetland <mlh at furu.idi.ntnu.no> wrote:
>> Any ideas?
>
>Few concrete examples, perhaps?  Sadly, my telepathetic power is not
>what it used to be...

Well... Hard to give specific examples, as the specifics will be
user-specified.

But in order to explain what I'm doing, I could give a rather generic
example.

I might have a bunch of regexps like 'foo1', 'foo2', 'foo3', ...,
'foo500' (these would be more complex, of course).

Now I can do something like this:

  hits = []
  for pat in pats:
      hits.extend(pat.finditer(text))
  # Maybe sort them in order of occurrence

*Or* I can do something like this:

  bigPat = '(' + '|'.join(pats) + ')'
  hits = list(bigPat.finditer(text))

The last one is *much* faster -- but only if I forego named groups.
And without them, I can't find out which patterns matched at which
locations. (I can, as suggested, use .lastindex to find *one* of them,
but not all, as I need.)

I'm sure there are plenty of bottlenecks in my code, but this seems to
be one of them, and it's slow even if I run it alone (without any of
the rest of my parsing code).

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park