Finding a \u0096
Gerhard Häring
gerhard.haering at opus-gmbh.net
Wed Dec 4 09:54:49 EST 2002
Gustaf Liljegren <gustafl at algonet.se> wrote:
> I'm using Python to automate some mechanisms in a Word to XML
> conversion. The XML file should be encoded in UTF-8. Since Word is
> using Microsoft's "ANSI" character set and I want Unicode in UTF-8,
> some characters need to be replaced. [...]
I don't agree with the conclusion that chars need to be replaced. I'd rather
say that the the encoding needs to be changed.
First you create a Unicode string from a known byte stream and a known encoding
(I don't know what 'ANSI' really is, let's take 'iso-8859-1' in this example):
unicode_string = unicode(input_bytestream, "iso-8859-1")
Now let's create a byte stream from the Unicode string again, this time in the
UTF-8 encoding:
output_bytestream = unicode_string.encode("UTF-8")
Regards,
Gerhard
--
Gerhard Häring
OPUS GmbH München
Tel.: +49 89 - 889 49 7 - 32
http://www.opus-gmbh.net/
More information about the Python-list
mailing list