passing multiple strings to string.find()

François Pinard pinard at iro.umontreal.ca
Sat Aug 9 11:55:03 EDT 2003


[Fredrik Lundh]

> Francois Pinard wrote:

> > Given the above,
> >
> >    build_regexp(['this', 'that', 'the-other'])
> >
> > yields the string 'th(?:is|at|e\\-other)', which one may choose to
> > `re.compile' before use.

> the SRE compiler looks for common prefixes, so "th(?:is|at|e\\-other)" is
> no different from "this|that|the-other" on the engine level.

Thanks for the note.  So the `build_regexp' function is not useful after
all.  It was indirectly written around a speed problem in the GNU regexp
engine, but seemingly, the Python regexp engine knows better already.  As I
wrote earlier, I first saw Emacs Lisp `regexp-opt' used within `enscript'.

A speed comparison between both methods shows that they are fairly
equivalent.  A small difference is that `build_regexp', given that one of
the word is a prefix of another, automatically recognises the longest one,
while a naive regexp of '|'.join(words) recognises whatever happens to be
listed first.  Of course, this is easily solved by sorting, then reversing
the word list before producing the naive regexp.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list