Python UTF-8 and codecs

Serge Orlov serge.orlov at gmail.com
Tue Jun 27 16:29:51 EDT 2006


On 6/27/06, Mike Currie <dev at null.com> wrote:
> I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in
> them.  Every configuration I try I get a UnicodeError: ascii codec can't
> decode byte 0x85 in position 255: oridinal not in range(128)
>
> I've tried using the codecs.open('foo.txt', 'rU', 'utf-8', errors='strict')
> and that doesn't work and I've also try wrapping the file in an utf8_writer
> using codecs.lookup('utf8')
>
> Any clues?

Use unicode strings for non-ascii characters. The following program "works":

import codecs

c1 = unichr(0x85)
f = codecs.open('foo.txt', 'wU', 'utf-8')
f.write(c1)
f.close()

But unichr(0x85) is a control characters, are you sure you want it?
What is the encoding of your data?



More information about the Python-list mailing list