[New-bugs-announce] [issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding
Neil Muller
report at bugs.python.org
Sun Jun 7 23:31:00 CEST 2009
New submission from Neil Muller <drnlmuller+bugs at gmail.com>:
In py3k, ElementTree no longer correctly converts characters to entities
when they can't be represented in the requested output encoding.
Python 2:
>>> import xml.etree.ElementTree as ET
>>> e = ET.XML("<?xml version='1.0'
encoding='iso-8859-1'?><body>t\xe3t</body>")
>>> ET.tostring(e, 'ascii')
"<?xml version='1.0' encoding='ascii'?>\n<body>tãt</body>"
Python 3:
>>> import xml.etree.ElementTree as ET
>>> e = ET.XML("<?xml version='1.0'
encoding='iso-8859-1'?><body>t\xe3t</body>")
>>> ET.tostring(e, 'ascii')
.....
UnicodeEncodeError: 'ascii' codec can't encode characters in position
1-2: ordinal not in range(128)
It looks like _encode_entity isn't ever called inside ElementTree
anymore - it probably should be called as part of _encode for characters
that can't be represented.
----------
components: Library (Lib)
messages: 89058
nosy: Neil Muller, effbot, hodgestar
severity: normal
status: open
title: ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding
type: behavior
versions: Python 3.0, Python 3.1
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6233>
_______________________________________
More information about the New-bugs-announce
mailing list