how to write unicode to a txt file?

Jean-Paul Calderone exarkun at divmod.com
Wed Jan 17 11:40:11 EST 2007


On 17 Jan 2007 08:28:14 -0800, Frank Potter <could.net at gmail.com> wrote:
>I want to change an srt file to unicode format so mpalyer can display
>Chinese subtitles properly.
>I did it like this:
>
>txt=open('dmd-guardian-cd1.srt').read()
>txt=unicode(txt,'gb18030')
>open('dmd-guardian-cd1.srt','w').write(txt)
>
>But it seems that python can't directly write unicode to a file,
>I got and error at the 3rd line:
>UnicodeEncodeError: 'ascii' codec can't encode characters in position
>85-96: ordinal not in range(128)
>
>How to save the unicode string to the file, please?

You cannot save unicode to a file.  Files can only contain bytes.  You
can encode the unicode into a str and then write the str to the file.

    f = open('...', 'w')
    f.write(txt.encode(encoding))
    f.close()

The encoding you select must be able to represent each code point in the
unicode object you encode with it.  The "ascii" encoding cannot encode all
of your code points, so when Python implicitly uses it, you get the
exception above.  If you use an encoding like "utf-8" which can represent
every code point, you won't see this exception.  You could also use the same
encoding which you used to decode the str originally, "gb18030", which will
be able to represent every code point, since it produced them in the first
place.  Of course, if you do this, you may as well not even bother to do the
decoding in the first place, unless you have some code in between the input
and output steps which manipulates the unicode in some way.

Jean-Paul



More information about the Python-list mailing list