Finding a \u0096

Wed Dec 4 09:54:49 EST 2002

Gustaf Liljegren <gustafl at algonet.se> wrote:
> I'm using Python to automate some mechanisms in a Word to XML
> conversion. The XML file should be encoded in UTF-8. Since Word is
> using Microsoft's "ANSI" character set and I want Unicode in UTF-8,
> some characters need to be replaced. [...]

I don't agree with the conclusion that chars need to be replaced. I'd rather
say that the the encoding needs to be changed.

First you create a Unicode string from a known byte stream and a known encoding
(I don't know what 'ANSI' really is, let's take 'iso-8859-1' in this example):

unicode_string = unicode(input_bytestream, "iso-8859-1")

Now let's create a byte stream from the Unicode string again, this time in the
UTF-8 encoding:

output_bytestream = unicode_string.encode("UTF-8")

Regards,

Gerhard
-- 
Gerhard Häring
OPUS GmbH München
Tel.: +49 89 - 889 49 7 - 32
http://www.opus-gmbh.net/