String manipulation

Alexander Schmolck a.schmolck at gmail.com
Wed Apr 4 11:39:39 EDT 2007


All the code is untested, but should give you the idea. 

marco.minerva at gmail.com writes:

> Hi all!
> 
> I have a file in which there are some expressions such as "kindest
> regard" and "yours sincerely". I must create a phyton script that
> checks if a text contains one or more of these expressions and, in
> this case, replaces the spaces in the expression with the character
> "_". For example, the text
> 
> Yours sincerely, Marco.
> 
> Must be transformated in:
> 
> Yours_sincerely, Marco.
> 
> Now I have written this code:
> 
> filemw = codecs.open(sys.argv[1], "r", "iso-8859-1").readlines()
> filein = codecs.open(sys.argv[2], "r", "iso-8859-1").readlines()
> 
> mw = ""
> for line in filemw:
> 	mw = mw + line.strip() + "|"

One "|" too many. Generally, use join instead of many individual string +s.

mwfind_re_string = "(%s)" % "|".join(line.strip() for line in filemw)

> mwfind_re = re.compile(r"^(" + mw + ")",re.IGNORECASE|re.VERBOSE)


mwfind_re = re.compile(mwfind_re_string),re.IGNORECASE)
 
> mwfind_subst = r"_"
> 
> for line in filein:

That doesn't work. What about "kindest\nregard"? I think you're best of
reading the whole file in (don't forget to close the files, BTW).


> 	line = line.strip()
> 	if (line != ""):
>         	                line = mwfind_re.sub(mwfind_subst, line)
> 		print line
> 
> It correctly identifies the expressions, but doesn't replace the
> character in the right way. How can I do what I want?

Use the fact that you can also use a function as a substitution.

print mwfind_re.sub(lambda match: match.group().replace(' ','_'), 
                    "".join(line.strip() for line in filein))

'as



More information about the Python-list mailing list