Module RE, Have a couple questions

Tue Mar 1 15:38:15 EST 2005

Francis Girard wrote:
> Le mardi 1 Mars 2005 16:52, Marc Huffnagle a écrit :
> 
>>[line for line in document if (line.find('word') != -1 \
>>        and line.find('wordtwo') != -1)]
> 
> 
> Hi,
> 
> Using re might be faster than scanning the same line twice :

My understanding of the second question was that he wanted to find lines 
which contained both words but, looking at it again, it could go either 
way.  If he wants to find lines that contain both of the words, in any 
order, then I don't think that it can be done without scanning the line 
twice (regex or not).

To the OP:  What kind of data are you testing?  Could you try both of 
these solutions on your sample data and let us know which runs faster?

> 
> === begin snap
> ## rewords.py
> 
> import re
> import sys
> 
> def iWordsMatch(lines, word, word2):
>   reWordOneTwo = re.compile(r".*(%s|%s).*" % (word,word2))
>   return (line for line in lines if reWordOneTwo.match(line))
>   
> for line in iWordsMatch(open("rewords.py"), "re", "return"):
>   sys.stdout.write(line)
> === end snap
> 
> Furthermore, using list comprehension generator (2.4 only I think) and file 
> iterator, you can scan files as big as you want with very little memory 
> usage.
> 
> Regards,
> 
> Francis Girard
>