Match 2 words in a line of file

harvey.thomas at informa.com harvey.thomas at informa.com
Fri Jan 19 05:45:54 EST 2007


Rickard Lindberg wrote:

> I see two potential problems with the non regex solutions.
>
> 1) Consider a line: "foo (bar)". When you split it you will only get
> two strings, as split by default only splits the string on white space
> characters. Thus "'bar' in words" will return false, even though bar is
> a word in that line.
>
> 2) If you have a line something like this: "foobar hello" then "'foo'
> in line" will return true, even though foo is not a word (it is part of
> a word).

Here's a solution using re.split:

import re
import StringIO

wordsplit = re.compile('\W+').split
def matchlines(fh, w1, w2):
    w1 = w1.lower()
    w2 = w2.lower()
    for line in fh:
        words = [x.lower() for x in wordsplit(line)]
        if w1 in words and w2 in words:
            print line.rstrip()

test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(StringIO.StringIO(test), 'test', 'word')




More information about the Python-list mailing list