the tostring and XML methods in ElementTree

George Sakkis george.sakkis at gmail.com
Wed May 17 04:09:42 EDT 2006


Fredrik Lundh wrote:

> mirandacascade at yahoo.com wrote:
>
> > I wanted to see what would happen if one used the results of a tostring
> > method as input into the XML method.  What I observed is this:
> > a) beforeCtag.text is of type <type 'str'>
> > b) beforeCtag.text when printed displays: I'm confused
> > c) afterCtag.text is of type <type 'unicode'>
> > d) afterCtag.text when printed displays: I?m confused
>
> the XML file format isn't a Python string serialization format, it's an XML infoset
> serialization format.
>
> as stated in the documentation, ET always uses Unicode strings for text that
> contain non-ASCII characters.  for text that *only* contains ASCII, it may use
> either Unicode strings or 8-bit strings, depending on the implementation.
>
> the behaviour if you're passing in non-ASCII text as 8-bit strings is undefined
> (which means that you shouldn't do that; it's not portable).

I was about to post a similar question when I found this thread.
Fredrik, can you explain why this is not portable ? I'm currently using
(a variation of) the workaround below instead of ET.tostring and it
works fine for me:

def tostring(element, encoding=None):
    text = element.text
    if text:
        if not isinstance(text, basestring):
            text2 = str(text)
        elif isinstance(text, str) and encoding:
            text2 = text.decode(encoding)
        element.text = text2
    s = ET.tostring(element, encoding)
    element.text = text
    return s


Why isn't this the standard behaviour ?

Thanks,
George




More information about the Python-list mailing list