[Tutor] how to extract text by specifying an element using ElementTree
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Thu Dec 8 20:47:45 CET 2005
> For example:
>
> <biological_processess>
> <biological_process>
> Signal transduction
> </biological_process>
> <biological_process>
> Energy process
> </biological_process>
> </biological_processess>
>
> I looked at some tutorials (eg. Ogbuji). Those
> examples described to extract all text of nodes and
> child nodes.
Hi Mdan,
The following might help:
http://article.gmane.org/gmane.comp.python.tutor/24986
http://mail.python.org/pipermail/tutor/2005-December/043817.html
The second post shows how we can use the findtext() method from an
ElementTree.
Here's another example that demonstrates how we can treat elements as
sequences of their subelements:
##################################################################
from elementtree import ElementTree
from StringIO import StringIO
text = """
<people>
<person>
<lastName>skywalker</lastName>
<firstName>luke</firstName>
</person>
<person>
<lastName>valentine</lastName>
<firstName>faye</firstName>
</person>
<person>
<lastName>reynolds</lastName>
<firstName>mal</firstName>
</person>
</people>
"""
people = ElementTree.fromstring(text)
for person in people:
print "here's a person:",
print person.findtext("firstName"), person.findtext('lastName')
##################################################################
Does this make sense? The API allows us to treat an element as a sequence
that we can march across, and the loop above marches across every person
subelement in people.
Another way we could have written the loop above would be:
###########################################
>>> for person in people.findall('person'):
... print person.find('firstName').text,
... print person.find('lastName').text
...
luke skywalker
faye valentine
mal reynolds
###########################################
Or we might go a little funkier, and just get the first names anywhere in
people:
###########################################
>>> for firstName in people.findall('.//firstName'):
... print firstName.text
...
luke
faye
mal
###########################################
where the subelement "tag" that we're giving findall is really an
XPath-query. ".//firstName" is an query in XPath format that says "Give
me all the firstName elements anywhere within the current element."
The documentation in:
http://effbot.org/zone/element.htm#searching-for-subelements
should also be helpful.
If you have more questions, please feel free to ask.
More information about the Tutor
mailing list