problems with xml parsing (python 3.3)

Dieter Maurer dieter at handshake.de
Sun Oct 28 03:30:36 EDT 2012


jannidis at gmail.com writes:

> I am new to Python and have a problem with the behaviour of the xml parser. Assume we have this xml document:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <bibliography>
>     <entry>
>             Title of the first book.
>         </entry>
>         <entry>
>             <coauthored/>
> Title of the second book.
>         </entry>
> </bibliography>    
>
>
> If I now check for the text of all 'entry' nodes, the text for the node with the empty element isn't shown
>
>
>
> import xml.etree.ElementTree as ET
> tree = ET.ElementTree(file='test.xml')
> root = tree.getroot()
> resultSet = root.findall(".//entry")
> for r in resultSet:
> 	print (r.text)

I do not know about "xml.etree" but the (said) quite compatible
"lxml.etree" handles text nodes in a quite different way from
that of "DOM": they are *not* considered children of the parent
element but are attached as attributes "text" and "tail" to either
the container element (if the first DOM node is a text node) or the preceeding
element, otherwise.

Your code snippet suggests that "xml.etree" behaves identically in
this respect. In this case, you would find "Title of the second book"
as the "tail" attribute of the element "coauthored".




More information about the Python-list mailing list