ElementTree surprise

Thu Aug 16 01:47:14 EDT 2007

Hallöchen!

Paul Rubin writes:

> I have a doc with a bunch of fields like:
>
>     <foo bar="spam">stuff</foo>
>     <foo bar="penguin">other stuff</foo>
>
> and sometimes
>
>     <foo bar="parrot"></foo>
>
> I use ElementTree to parse the doc and I use the .text attribute
> to get "stuff" or "other stuff" in the spam and penguin examples.
>
> I'd expect .text to be the empty string in the parrot example, but
> instead it is None.

Technically, text is nodes as all other element nodes.  In the
parrot example, there is no empty textnode but no textnode at all.

> I can fix my script to deal with this, but it's surprising.  Is it
> intentional?  I could understand it being None if the doc had said
>
>    <foo bar="parrot"/>
>
> but that is different.

<foo bar="parrot"/> and <foo bar="parrot"></foo> are mapped to the
same thing by any XML parser, and I think it wouldn't be standards
conforming if an XML parser would pass this difference to a caller.

> Disclaimer: I'm not even slightly an XML expert, I just find myself
> having to deal with a lot of it.

I think the question is how XMLish the access via ElementTree should
be.  While it is in principle correct that there is no text node in
parrot, it may be sensible to set it to "" for practical reasons.
As far as I can see, there is no empty text node in XML, so no
ambiguity would occur.

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetus
                                      Jabber ID: bronger at jabber.org
                      (See http://ime.webhop.org for ICQ, MSN, etc.)