Please help!! SAXParseException: not well-formed (invalid token)

jvictor118 at yahoo.fr jvictor118 at yahoo.fr
Tue Mar 27 13:11:21 EDT 2007


I checked the file format (of the file containing the n-tilde - ñ) and
it is indeed UTF-8! I'm baffled! Any ideas?

Thanks,
Jason

On Mar 27, 11:16 am, "Diez B. Roggisch" <d... at nospam.web.de> wrote:
> jvictor... at yahoo.fr wrote:
> > I've been using the xml.sax.handler module to do event-driven parsing
> > of XML files in this python application I'm working on. However, I
> > keep having really pesky invalid token exceptions. Initially, I was
> > only getting them on control characters, and a little "sed -e 's/
> > [^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
> > been getting these invalid token excpetions with n-tildes (like the n
> > in España), smart/fancy/curly quotes and other seemingly harmless
> > characters. Specifying encoding="utf-8" in the xml header hasn't
> > helped matters.
>
> > Any ideas? As a last resort, I'd be willing to scrub invalid
> > characters.... it just seems strange that curly quotes and n-tildes
> > wouldn't be valid XML! Is that really the case?
>
> It's not the case, unless you have a wrong encoding. Then the whole
> XML-Document isn't a XML-document at all.
>
> Just putting an encoding header that doesn't match the actually used
> encoding won't fix that.
>
> Read up on what encodings are, and ensure your XML-generation respects that.
> Then reading these files will cause no problems.
>
> Diez




More information about the Python-list mailing list