Newbie XML SAX Parsing: How do I ignore an invalid token?

"Martin v. Löwis" martin at v.loewis.de
Sat Jan 6 03:30:12 EST 2007


scott at crybabymaternity.com schrieb:
> My original posting has a funky line break character (it appears as an
> ascii square) that blows up my program, but it may or may not show up
> when you view my message.

Looking at your document, it seems that this "funky line break
character" is character \x1E, which, in latin-1, means "record
separator". It's indeed ill-formed to use it in XML.

> Is there a way to account for the invalid token in the error handler?

Not with a standard XML parser, no. The error you describe is a "fatal
error", and that's not something parsing can recover from. I recommend
that you filter this character out before passing it to the XML parser.
You can use the IncrementalParser interface to do so.

Regards,
Martin



More information about the Python-list mailing list