passing multiple strings to string.find()
François Pinard
pinard at iro.umontreal.ca
Sat Aug 9 11:55:03 EDT 2003
[Fredrik Lundh]
> Francois Pinard wrote:
> > Given the above,
> >
> > build_regexp(['this', 'that', 'the-other'])
> >
> > yields the string 'th(?:is|at|e\\-other)', which one may choose to
> > `re.compile' before use.
> the SRE compiler looks for common prefixes, so "th(?:is|at|e\\-other)" is
> no different from "this|that|the-other" on the engine level.
Thanks for the note. So the `build_regexp' function is not useful after
all. It was indirectly written around a speed problem in the GNU regexp
engine, but seemingly, the Python regexp engine knows better already. As I
wrote earlier, I first saw Emacs Lisp `regexp-opt' used within `enscript'.
A speed comparison between both methods shows that they are fairly
equivalent. A small difference is that `build_regexp', given that one of
the word is a prefix of another, automatically recognises the longest one,
while a naive regexp of '|'.join(words) recognises whatever happens to be
listed first. Of course, this is easily solved by sorting, then reversing
the word list before producing the naive regexp.
--
François Pinard http://www.iro.umontreal.ca/~pinard
More information about the Python-list
mailing list