Regexp optimization question
Magnus Lie Hetland
mlh at furu.idi.ntnu.no
Fri Apr 23 11:47:09 EDT 2004
In article <c69b42$9f6eq$1 at ID-99293.news.uni-berlin.de>, William Park wrote:
>Magnus Lie Hetland <mlh at furu.idi.ntnu.no> wrote:
>> Any ideas?
>
>Few concrete examples, perhaps? Sadly, my telepathetic power is not
>what it used to be...
Well... Hard to give specific examples, as the specifics will be
user-specified.
But in order to explain what I'm doing, I could give a rather generic
example.
I might have a bunch of regexps like 'foo1', 'foo2', 'foo3', ...,
'foo500' (these would be more complex, of course).
Now I can do something like this:
hits = []
for pat in pats:
hits.extend(pat.finditer(text))
# Maybe sort them in order of occurrence
*Or* I can do something like this:
bigPat = '(' + '|'.join(pats) + ')'
hits = list(bigPat.finditer(text))
The last one is *much* faster -- but only if I forego named groups.
And without them, I can't find out which patterns matched at which
locations. (I can, as suggested, use .lastindex to find *one* of them,
but not all, as I need.)
I'm sure there are plenty of bottlenecks in my code, but this seems to
be one of them, and it's slow even if I run it alone (without any of
the rest of my parsing code).
--
Magnus Lie Hetland "Wake up!" - Rage Against The Machine
http://hetland.org "Shut up!" - Linkin Park
More information about the Python-list
mailing list