Ignoring incorrect XML encoding declarations

Andrew Dalke adalke at mindspring.com
Thu Jan 16 00:12:16 EST 2003


Peter Scott wrote:
> I have an XML file which I'm trying to parse with xml.sax, but, for 
> reasons beyond my control, it uses the UTF-8 character encoding but has 
> this incorrect line at the beginning:
> <?xml version="1.0" encoding="UTF-16"?>

I had a data stream which was Latin-1 but didn't specify
the encoding.  I fixed it using


import codecs
   ...
      infile = codecs.EncodedFile(infile, "utf-8", "iso-8859-1")


					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list