How to force SAX parser to ignore encoding problems

Stefan Behnel stefan_ml at behnel.de
Fri Aug 7 02:40:35 EDT 2009


Łukasz wrote:
> I have a problem with my XML parser (created with libraries from
> xml.sax package). When parser finds a invalid character (in CDATA
> section) for example �, throws an exception SAXParseException.
> 
> Is there any way to just ignore this kind of problem. Maybe there is a
> way to set up parser in less strict mode?
> 
> I know that I can catch this exception and determine if this is this
> kind of problem and then ignore this, but I am asking about any global
> setting.

The parser from libxml2 that lxml provides has a recovery option, i.e. it
can keep parsing regardless of errors and will drop the broken content.

However, it is *always* better to fix the input, if you get any hand on it.
Broken XML is *not* XML at all. If you can't fix the source, you can never
be sure that the data you received is in any way complete or even usable.

Stefan



More information about the Python-list mailing list