Error handling in SAX

Stefan Behnel stefan_ml at behnel.de
Sun May 4 01:46:40 EDT 2008


mrkafk at gmail.com wrote:
> (this is a repost, for it's been a while since I posted this text via
> Google Groups and it plain didn't appear on c.l.py - if it did appear
> anyway, apols)

It did, although some people have added google groups to their kill file.


> So I set out to learn handling three-letter-acronym files in Python,
> and SAX worked nicely until I encountered badly formed XMLs, like with
> bad characters in it (well Unicode supposed to handle it all but
> apparently doesn't),

If it's not well-formed, it's not XML. XML parsers are required to reject non
well-formed input.

In case it actually is well-formed XML and the problem is somewhere in your
code but you can't see it through the SAX haze, try lxml. It also allows you
to pass the expected encoding to the parser to override broken document encodings.

http://codespeak.net/lxml/

Stefan



More information about the Python-list mailing list