[Tutor] Help! Character conversion from a rtf file.

Kent Johnson kent37 at tds.net
Fri Jun 20 20:21:32 CEST 2008


On Fri, Jun 20, 2008 at 12:46 PM, Chien Nguyen <chiennguyen at yahoo.com> wrote:
> Hi All,
> I am a newbie to Python. I just did some readings on the web
> and got some basic understanding about the language. I'd like
> to learn the language by writing some simple programs rather than
> keep reading books. My first program will convert certain uni-code
> characters
> (let's say UTF-8) in an RTF file format based on a certain mapping
> in another RTF file that is called a "RTF Control file". On each line
> of the Control file, there are 2 tokens separate by a TAB or a space.

That doesn't sound like an RTF file, more like UTF-8 text.

> The first token contains the character that needs to be converted from,
> and the second character contains the character that needs to be converted
> to.
>
> The program will write to a new file that contains a new set of mapped
> characters.
> If a character form the original file is not found in the Control file, then
> the program
> just write the same character to the new file.

Hopefully your reading has shown you the way to read and write files.
Look at the codecs module for reading and writing UTF-8 files.

Once you have the file data loaded you can use the replace method of
the data to change the characters.

Something like this (*very rough*)

import codecs
data = codecs.open('data.rtf', 'r', 'utf-8').read()
replacements = codecs.open('replace.rtf', 'r', 'utf-8')

for line in replacements:
  line = line.strip()
  if line:
    from, to = line.split()
    data.replace(from, to)

f = codecs.open('newdata.rtf', 'w', 'utf-8')
f.write(data)
f.close()

HTH,
Kent


More information about the Tutor mailing list