SAXParseException: not well-formed (invalid token)

Lawrence D'Oliveiro ldo at geek-central.gen.new_zealand
Fri Aug 31 01:20:03 EDT 2007


In message <mailman.137.1188481649.28954.python-list at python.org>, Carsten
Haese wrote:

> If your input string contains the byte 0xe9 where your accented e is,
> the file is actually latin-1 encoded. If it contains the byte sequence
> 0xc3,0xa9 it is UTF-8 encoded.

It is dismaying how often I come across Web pages that claim to be
UTF-8-encoded, but are actually Latin-1 or Dimdows-1252.



More information about the Python-list mailing list