[Tutor] Iterating through a list of replacement regex patterns

David Hutto smokefloat at gmail.com
Fri Sep 3 04:24:00 CEST 2010


In the below function I'm trying to iterate over the lines in a textfile,
and try to match with a regex pattern that iterates over the lines in
a dictionary(not {}, but turns a text list of alphabetical words into a
list using readlines()).

def regexfiles(filename):
	textfile = file(filename, 'r+')
                # Open the dictionary file
	dictionary = file('/var/www/main/american-english', 'r')
	search = 0
	readict = dictionary.readlines()
	readfile = textfile.readlines()
	select = 'abbott'
	for item in readict:
		search += 1
		print search, '\nselect =' , select , 'item = ' , item , 'readfile =
' , str(readfile) , '\nre.findall =' , re.findall( select,
str(readfile)) , '\nre.search = ' , re.search(select,
str(readfile[:])), '\n'

My last entry that comes up is:

14
select = abbott
item =  abbott
len readfile =  6
readfile =  ['|aaolaachenaaliyahaaronabbasabbasidabbottsaaolaachenaaliyahaaronabbasabbasidabbott"aaolaachenaaliyahaaronabbasabbasidabbott}aaolaachenaaliyahaaronabbasabbasidabbottvaaolaachenaaliyahaaronabbasabbasidabbott']
re.findall = ['abbott', 'abbott', 'abbott', 'abbott', 'abbott']
re.search =  <_sre.SRE_Match object at 0x8838b80>

Which is fine until I begin trying to iterate over the words in my
word 'dictionary' list to use as
replacement patterns with each new word iterated over in the list
getting placed as the regex pattern.

If I try to replace the variable 'select' with anything other than
select = 'abbott'(or whatever random
word being used that is in the file), with something like
str(readict[13]), which is a list of words, and the 13th
word in the list is also abbott, and is turned into a string, yielding
an extra +1 len, I get.

14
select = abbott
item =  abbott
len readfile =  7
readfile =  ['|aaolaachenaaliyahaaronabbasabbasidabbottsaaolaachenaaliyahaaronabbasabbasidabbott']
re.findall = []
re.search =  None

re.findall, and re.search show none, even when the variables show the
same, other than len,
and they've been turned into strings.

So my main question is...drum roll please...how do I iterate through a
list of terms inside of the regex,
without it yielding the second result?

Right now, I can see that it searches the file for the term if it's in
' ', so that
part works, and on other attempts than this one, I can get it to loop
through the words in the dictionary
list and replace the regex pattern as it goes through, then use
readlines() to check the files lines, but
even with this the changing variable makes it show none.

TIA,
David


More information about the Tutor mailing list