Module RE, Have a couple questions
Marc Huffnagle
mhuffnagle at knowtechnology.net
Tue Mar 1 15:38:15 EST 2005
Francis Girard wrote:
> Le mardi 1 Mars 2005 16:52, Marc Huffnagle a écrit :
>
>>[line for line in document if (line.find('word') != -1 \
>> and line.find('wordtwo') != -1)]
>
>
> Hi,
>
> Using re might be faster than scanning the same line twice :
My understanding of the second question was that he wanted to find lines
which contained both words but, looking at it again, it could go either
way. If he wants to find lines that contain both of the words, in any
order, then I don't think that it can be done without scanning the line
twice (regex or not).
To the OP: What kind of data are you testing? Could you try both of
these solutions on your sample data and let us know which runs faster?
>
> === begin snap
> ## rewords.py
>
> import re
> import sys
>
> def iWordsMatch(lines, word, word2):
> reWordOneTwo = re.compile(r".*(%s|%s).*" % (word,word2))
> return (line for line in lines if reWordOneTwo.match(line))
>
> for line in iWordsMatch(open("rewords.py"), "re", "return"):
> sys.stdout.write(line)
> === end snap
>
> Furthermore, using list comprehension generator (2.4 only I think) and file
> iterator, you can scan files as big as you want with very little memory
> usage.
>
> Regards,
>
> Francis Girard
>
More information about the Python-list
mailing list