[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Stefan Behnel report at bugs.python.org
Fri Mar 12 10:38:28 CET 2010


Stefan Behnel <scoder at users.sourceforge.net> added the comment:

"'None' has always been the documented default for the encoding parameter"

What I meant here was that "help(ET.tostring)" will show you that as the default. Also, in the docs, the signature is "tostring(tree, encoding=None)", so None is the documented default value for the argument, regardless of the internal handling.


> "writing out the Unicode serialisation will result in an incorrect
> XML serialisation"
> I think Guido meant the ElementTree.write method; is that broken too?

Yes, the feature has been implemeted deep down in the _encode() helper function, so it impacts the entire serialiser, not only its API.


> I think I'd prefer old "tostring" behaviour and a separate "tounicode" function, and I'm still not convinced that the latter is required for the XML use case (which implies that maybe it should live in lxml.html for the HTML case, even if it ends up calling the same internal implementation).

I obviously agree that the use case for XML is fable, but that alone doesn't make this a convincing argument to move it into lxml.html when the implementation will stay in lxml.etree anyway. Besides, that's pretty off-topic for this bug tracker.


> Or should that be "tobytes" and "tounicode" to eliminate all ambiguity?

That might be the clean break-all-bridges solution, but I don't think the name tostring() is so inherently broken in Py3 that it needs fixing. It's not "tostr()", for example.

I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API.

Stefan

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8047>
_______________________________________


More information about the Python-bugs-list mailing list