[issue1767933] Badly formed XML using etree and utf-16

Sat Oct 2 11:56:31 CEST 2010

Amaury Forgeot d'Arc <amauryfa at gmail.com> added the comment:

Python 3.1 improves the situation, the file looks more like utf-16, except that the BOM ("\xff\xfe") is repeated all the time, probably on every internal call to file.write().

Here is a test script that should work on both 2.7 and 3.1.

from io import BytesIO
from xml.etree.ElementTree import ElementTree
content = "<?xml version='1.0' encoding='UTF-16'?><html></html>"
input = BytesIO(content.encode('utf-16'))
tree = ElementTree()
tree.parse(input)
# Write content
output = BytesIO()
tree.write(output, encoding="utf-16")
assert output.getvalue().decode('utf-16') == content

----------
stage: unit test needed -> needs patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1767933>
_______________________________________