Module RE, Have a couple questions

Tue Mar 1 16:25:22 EST 2005

Hi,

This might even be faster since using re.search, we don't need to parse the 
whole line.

Regards,

Francis Girard

=== BEGIN SNAP
## rewords.py

import re
import sys

def iWordsMatch(lines, word, word2):
  reWordOneTwo = re.compile(r"((%s.*%s)|(%s.*%s))" % 
                            (word,word2,word2,word))
  return (line for line in lines if reWordOneTwo.search(line))

for line in iWordsMatch(open("rewords.py"), "re", "return"):
  sys.stdout.write(line)
=== END SNAP

Le mardi 1 Mars 2005 21:57, Francis Girard a écrit :
> Le mardi 1 Mars 2005 21:38, Marc Huffnagle a écrit :
> > My understanding of the second question was that he wanted to find lines
> > which contained both words but, looking at it again, it could go either
> > way.  If he wants to find lines that contain both of the words, in any
> > order, then I don't think that it can be done without scanning the line
> > twice (regex or not).
>
> I don't know if it is really faster but here's a version that finds both
> words on the same line. My understanding is that re needs to parse the line
> only once. This might count on very large inputs.
>
> === Begin SNAP
> ## rewords.py
>
> import re
> import sys
>
> def iWordsMatch(lines, word, word2):
>   reWordOneTwo = re.compile(r".*((%s.*%s)|(%s.*%s)).*" %
>                             (word,word2,word2,word))
>   return (line for line in lines if reWordOneTwo.match(line))
>
> for line in iWordsMatch(open("rewords.py"), "re", "return"):
>   sys.stdout.write(line)
> === End SNAP
>
> Regards,
>
> Francis Girard