88k regex = RuntimeError

Tue Feb 14 10:17:08 EST 2006

Tim N. van der Leeuw wrote:

> This is basically the same idea as what I tried to describe in my
> previous post but without any samples.
> I wonder if it's more efficient to create a new list using a
> list-comprehension, and checking each entry against the 'wanted' set,
> or to create a new set which is the intersection of set 'wanted' and
> the iterable of all matches...
> 
> Your sample code would then look like this:
> 
>>>> import re
>>>> r = re.compile(r"\w+")
>>>> file_content = "foo bar-baz ignored foo()"
>>>> wanted = set(["foo", "bar", "baz"])
>>>> found = wanted.intersection(name for name in r.findall(file_content))

Just

found = wanted.intersection(r.findall(file_content))

>>>> print found
> set(['baz', 'foo', 'bar'])
>>>>
> 
> Anyone who has an idea what is faster? (This dataset is so limited that
> it doesn't make sense to do any performance-tests with it)

I guess that your approach would be a bit faster though most of the time
will be spent on IO anyway. The result would be slightly different, and
again yours (without duplicates) seems more useful. 

However, I'm not sure whether the OP would rather stop at the first match or
need a match object and not just the text. In that case:

matches = (m for m in r.finditer(file_content) if m.group(0) in wanted)

Peter