[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Antoine Pitrou report at bugs.python.org
Mon Mar 8 16:05:49 CET 2010


Antoine Pitrou <pitrou at free.fr> added the comment:

Le Mon, 08 Mar 2010 09:01:19 +0000,
Stefan Behnel <report at bugs.python.org> a écrit :
> 
> Antoine, in the same comment, you say that it was not backported to
> Py2 in order to prevent breaking existing code, and then you ask if
> it's difficult to support in lxml. ;-)

I meant breaking existing *user* code. Besides, the fact that
compatibility is broken doesn't mean third-party code difficult to fix;
hence my question.

> Supporting the same behaviour in lxml would either mean that it
> breaks existing code in Py2 (when making the API consistent), or that
> you can safely (and correctly) write the return value to a file in
> Py2, but that you can't do the same in Py3 (when adopting the change
> only in Py3).

Sorry, I don't understand this. Are you saying it's impossible
for you to define two different behaviours based on the current Python
version? What's bad with
"""if sys.version_info() >= (3, 0, 0): # blah"""

> Previously, in ElementTree, serialising without an explicit encoding
> was a way to get a byte encoded serialisation without an XML
> declaration header, so I expect there to be code that depends on
> this.

This doesn't seem to be documented. The doc simply says
"""encoding is the output encoding (default is US-ASCII)""".

In other words, undocumented (and untested) behaviour has been "broken"
when porting to 3.0, which is the version which deliberately broke
compatibility for documented things. I guess we can live with it ;)

> Even the latest
> 3.2-dev docs still state that the default encoding of the serialiser
> is US-ASCII, not a word about *ever* returning a unicode string,
> especially not by default, and totally not the required big fat
> warning that writing to a file will fail with mysterious errors if no
> encoding is specified.

Ok, perhaps some documentation changes are in order :-)
(I wonder why the default was US-ASCII, though. Sounds a bit braindead)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________


More information about the Python-bugs-list mailing list