ElementTree surprise

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Aug 16 03:18:18 EDT 2007


Paul Rubin wrote:
> Torsten Bronger <bronger at physik.rwth-aachen.de> writes:
>>>     <foo bar="parrot"></foo>
>> Technically, text is nodes as all other element nodes.  In the
>> parrot example, there is no empty textnode but no textnode at all.
> 
> That is required by the xml standard?  If yes, elementtree is doing
> the right thing, but it surprises me, I would have expected an empty
> string.  Thanks.

The XML standard defines both as being equivalent, so any XML parser would
handle them exactly the same. Also, as most XML parsers have a SAX(-like)
interface, which always generates events in the "<foo></foo>" form, there is
not even a way for applications or libraries to distinguish between the two.

So it's not even an ElementTree thing. ET just doesn't know what exactly was
in the original XML byte stream. A very simple way to make sure you always get
a string back is

    >>> text = element.text or ""

BTW, you'd be even more surprised to see that ET can actually /store/ "" as
text if you tell it to, and then returns an empty string when you ask for the
.text property. But any empty text coming from the parser will always be None.

Oh, and lxml.etree behaves exactly the same as ElementTree here. :)

Stefan



More information about the Python-list mailing list