SAXParseException: not well-formed (invalid token)
Carsten Haese
carsten at uniqsys.com
Thu Aug 30 09:47:21 EDT 2007
On Thu, 2007-08-30 at 15:20 +0200, Pablo Rey wrote:
> Hi Stefan,
>
> The xml has specified an encoding (<?xml version="1.0" encoding="UTF-8"
> ?>).
It's possible that the encoding specification is incorrect:
>>> u = u"\N{LATIN SMALL LETTER E WITH ACUTE}"
>>> print repr(u.encode("latin-1"))
'\xe9'
>>> print repr(u.encode("utf-8"))
'\xc3\xa9'
If your input string contains the byte 0xe9 where your accented e is,
the file is actually latin-1 encoded. If it contains the byte sequence
0xc3,0xa9 it is UTF-8 encoded.
If the string is encoded in latin-1, you can transcode it to utf-8 like
this:
contents = contents.decode("latin-1").encode("utf-8")
HTH,
--
Carsten Haese
http://informixdb.sourceforge.net
More information about the Python-list
mailing list