the tostring and XML methods in ElementTree
George Sakkis
george.sakkis at gmail.com
Wed May 17 04:09:42 EDT 2006
Fredrik Lundh wrote:
> mirandacascade at yahoo.com wrote:
>
> > I wanted to see what would happen if one used the results of a tostring
> > method as input into the XML method. What I observed is this:
> > a) beforeCtag.text is of type <type 'str'>
> > b) beforeCtag.text when printed displays: I'm confused
> > c) afterCtag.text is of type <type 'unicode'>
> > d) afterCtag.text when printed displays: I?m confused
>
> the XML file format isn't a Python string serialization format, it's an XML infoset
> serialization format.
>
> as stated in the documentation, ET always uses Unicode strings for text that
> contain non-ASCII characters. for text that *only* contains ASCII, it may use
> either Unicode strings or 8-bit strings, depending on the implementation.
>
> the behaviour if you're passing in non-ASCII text as 8-bit strings is undefined
> (which means that you shouldn't do that; it's not portable).
I was about to post a similar question when I found this thread.
Fredrik, can you explain why this is not portable ? I'm currently using
(a variation of) the workaround below instead of ET.tostring and it
works fine for me:
def tostring(element, encoding=None):
text = element.text
if text:
if not isinstance(text, basestring):
text2 = str(text)
elif isinstance(text, str) and encoding:
text2 = text.decode(encoding)
element.text = text2
s = ET.tostring(element, encoding)
element.text = text
return s
Why isn't this the standard behaviour ?
Thanks,
George
More information about the Python-list
mailing list