Help needed: Unicode and file format problem

"Martin v. Löwis" martin at v.loewis.de
Tue Sep 21 14:40:09 EDT 2004


Pekka Niiranen wrote:
> In other words I have to open target file like this:
> fileObj = codecs.open( "File_to_be_modified", "w", "utf-8" )
> and then run Unicode regular expression to it, where read
> replacements are bytes that must be written out as UTF-8 strings.

You need to read the file into a unicode string, perform the
replacement, then write it back out as UTF-8. Something like this:

infile = codecs.open( "File_to_be_read", "r", "utf-8" )
outfile = codecs.open( "File_to_be_written", "w", "utf-8" )

regexp = re.compile("some_expression", re.U)
for line in infile.readlines():
     line = regexp.sub(new_text, line)
     outfile.write(line)

HTH,
Martin



More information about the Python-list mailing list