Match 2 words in a line of file

Steven D'Aprano steve at REMOVE.THIS.cybersource.com.au
Sat Jan 20 06:09:19 EST 2007


On Fri, 19 Jan 2007 22:57:37 -0800, Rickard Lindberg wrote:

> Daniel Klein wrote:
> 
>> 2) This can be resolved with
>>
>> templine = ' ' + line + ' '
>> if ' ' + word1 + ' ' in templine and ' ' + word2 + ' ' in templine:
> 
> But then you will still have a problem to match the word "foo" in a
> string like "bar (foo)".

That's a good point for a general word-finder application, but in the case
of the Original Poster's problem, it depends on the data he is dealing
with and the consequences of errors.

If the consequences are serious, then he may need to take extra
precautions. But if the consequences are insignificant, then the fastest,
most reliable solution is probably a simple generator:

def find_new_events(text):
    for line in text.splitlines():
        line = line.lower() # remove this for case-sensitive matches
        if "event" in line and "new" in line:
            yield line

To get all the matching lines at once, use list(find_new_events(test)).

This is probably going to be significantly faster than a regex.

So that's three possible solutions:

(1) Use a quick non-regex matcher, and deal with false positives later;

(2) Use a slow potentially complicated regex; or

(3) Use a quick non-regex matcher to eliminate obvious non-matches, then
pass the results to a slow regex to eliminate any remaining false
positives.


Which is best will depend on the O.P.'s expected data. As always, resist
the temptation to guess which is faster, and instead use the timeit module
to measure it.


-- 
Steven.




More information about the Python-list mailing list