Ignoring incorrect XML encoding declarations

Peter Scott sketerpot at chase3000.com
Wed Jan 15 18:48:27 EST 2003


I have an XML file which I'm trying to parse with xml.sax, but, for 
reasons beyond my control, it uses the UTF-8 character encoding but has 
this incorrect line at the beginning:
<?xml version="1.0" encoding="UTF-16"?>
The parser wisely spots this error and throws a SAXParseException, and I 
can't parse the file. I've tried just catching the exception and 
printing an error, then keeping on parsing, but it didn't work. Is there 
any way I can get the SAX parser to ignore the 'encoding="UTF-16"' and 
parse the file with the real encoding?

Thanks,
Peter





More information about the Python-list mailing list