Regular Expression question
Fredrik Lundh
fredrik at pythonware.com
Thu Dec 1 13:09:06 EST 2005
Michelle McCall wrote:
>I have a script that needs to scan every line of a file for numerous
> strings. There are groups of strings for each "area" of data we are looking
> for. Looping through each of these list of strings separately for each line
> has slowed execution to a crawl. Can I create ONE regular expression from a
> group of strings such that when I perform a search on a line from the file
> with this RE it will search the line for each one of the strings in the RE ?
does
m = re.search("spam|egg|bacon", line)
do what you want?
if you need all matches, you can use
for m in re.finditer("spam|egg|bacon", line):
...
if the strings are all literal strings (i.e. no subpatterns), a little preparation might
speed things up:
words = ["spam", "spim", "spum", "spamwall", "wallspam"]
words.sort() # lexical order
words.reverse() # look for longest match first
pattern = "|".join(map(re.escape, words))
pattern = re.compile(pattern)
for m in pattern.finditer(line):
...
</F>
More information about the Python-list
mailing list