String manipulation
Alexander Schmolck
a.schmolck at gmail.com
Wed Apr 4 11:39:39 EDT 2007
All the code is untested, but should give you the idea.
marco.minerva at gmail.com writes:
> Hi all!
>
> I have a file in which there are some expressions such as "kindest
> regard" and "yours sincerely". I must create a phyton script that
> checks if a text contains one or more of these expressions and, in
> this case, replaces the spaces in the expression with the character
> "_". For example, the text
>
> Yours sincerely, Marco.
>
> Must be transformated in:
>
> Yours_sincerely, Marco.
>
> Now I have written this code:
>
> filemw = codecs.open(sys.argv[1], "r", "iso-8859-1").readlines()
> filein = codecs.open(sys.argv[2], "r", "iso-8859-1").readlines()
>
> mw = ""
> for line in filemw:
> mw = mw + line.strip() + "|"
One "|" too many. Generally, use join instead of many individual string +s.
mwfind_re_string = "(%s)" % "|".join(line.strip() for line in filemw)
> mwfind_re = re.compile(r"^(" + mw + ")",re.IGNORECASE|re.VERBOSE)
mwfind_re = re.compile(mwfind_re_string),re.IGNORECASE)
> mwfind_subst = r"_"
>
> for line in filein:
That doesn't work. What about "kindest\nregard"? I think you're best of
reading the whole file in (don't forget to close the files, BTW).
> line = line.strip()
> if (line != ""):
> line = mwfind_re.sub(mwfind_subst, line)
> print line
>
> It correctly identifies the expressions, but doesn't replace the
> character in the right way. How can I do what I want?
Use the fact that you can also use a function as a substitution.
print mwfind_re.sub(lambda match: match.group().replace(' ','_'),
"".join(line.strip() for line in filein))
'as
More information about the Python-list
mailing list