Please help!! SAXParseException: not well-formed (invalid token)

kyosohma at gmail.com kyosohma at gmail.com
Tue Mar 27 11:16:27 EDT 2007


On Mar 27, 9:59 am, jvictor... at yahoo.fr wrote:
> I've been using the xml.sax.handler module to do event-driven parsing
> of XML files in this python application I'm working on. However, I
> keep having really pesky invalid token exceptions. Initially, I was
> only getting them on control characters, and a little "sed -e 's/
> [^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
> been getting these invalid token excpetions with n-tildes (like the n
> in España), smart/fancy/curly quotes and other seemingly harmless
> characters. Specifying encoding="utf-8" in the xml header hasn't
> helped matters.
>
> Any ideas? As a last resort, I'd be willing to scrub invalid
> characters.... it just seems strange that curly quotes and n-tildes
> wouldn't be valid XML! Is that really the case?
>
> TIA!
>
> Jason

Are you making sure to encode the strings you pass into the parser in
UTF-8 or UTF-16? This article was illuminating in that respect and may
be helpful in diagnosing your problem:

http://www.xml.com/pub/a/2002/11/13/py-xml.html?page=2

Mike




More information about the Python-list mailing list