[Tutor] how to extract text by specifying an element using ElementTree

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Dec 8 20:47:45 CET 2005



> For example:
>
> <biological_processess>
>    <biological_process>
>            Signal transduction
>    </biological_process>
>    <biological_process>
>            Energy process
>     </biological_process>
> </biological_processess>
>
> I looked at some tutorials (eg. Ogbuji).  Those
> examples described to extract all text of nodes and
> child nodes.

Hi Mdan,

The following might help:

    http://article.gmane.org/gmane.comp.python.tutor/24986
    http://mail.python.org/pipermail/tutor/2005-December/043817.html

The second post shows how we can use the findtext() method from an
ElementTree.

Here's another example that demonstrates how we can treat elements as
sequences of their subelements:

##################################################################
from elementtree import ElementTree
from StringIO import StringIO

text = """
<people>
    <person>
        <lastName>skywalker</lastName>
        <firstName>luke</firstName>
    </person>
    <person>
        <lastName>valentine</lastName>
        <firstName>faye</firstName>
    </person>
    <person>
        <lastName>reynolds</lastName>
        <firstName>mal</firstName>
    </person>
</people>
"""

people = ElementTree.fromstring(text)
for person in people:
    print "here's a person:",
    print person.findtext("firstName"), person.findtext('lastName')
##################################################################


Does this make sense?  The API allows us to treat an element as a sequence
that we can march across, and the loop above marches across every person
subelement in people.


Another way we could have written the loop above would be:

###########################################
>>> for person in people.findall('person'):
...     print person.find('firstName').text,
...     print person.find('lastName').text
...
luke skywalker
faye valentine
mal reynolds
###########################################


Or we might go a little funkier, and just get the first names anywhere in
people:

###########################################
>>> for firstName in people.findall('.//firstName'):
...     print firstName.text
...
luke
faye
mal
###########################################

where the subelement "tag" that we're giving findall is really an
XPath-query.  ".//firstName" is an query in XPath format that says "Give
me all the firstName elements anywhere within the current element."


The documentation in:

    http://effbot.org/zone/element.htm#searching-for-subelements

should also be helpful.


If you have more questions, please feel free to ask.



More information about the Tutor mailing list