encoding="utf8" ignored when parsing XML

Skip Montanaro skip.montanaro at gmail.com
Tue Dec 27 10:47:16 EST 2016


Peter> Isn't UTF-8 the default?

Apparently not. I believe in my reading it said that it used whatever
locale.getpreferredencoding() returned. That's problematic when you
live in a country that thinks ASCII is everything. Personally, I think
UTF-8 should be the default, but that train's long left the station,
at least for Python 2.x.

> Try opening the file in binary mode then:
>
> with io.open(fname, "rb") as f:
>     root = xml.tree.ElementTree.parse(f).getroot()

Thanks, that worked. Would appreciate an explanation of why binary
mode was necessary. It would seem that since the file contents are
text, just in a non-ASCII encoding, that specifying the encoding when
opening the file should do the trick.

Skip



More information about the Python-list mailing list