[Tutor] how to extract text by specifying an element using ElementTree
ps python
ps_python3 at yahoo.co.in
Tue Dec 20 18:26:39 CET 2005
Dear Drs. Johnson and Yoo ,
for the last 1 week I have been working on parsing
the elements from a bunch of XML files following your
suggestions.
until now I have been unsuccessul. I have no clue why
i am failing.
I have ~16K XML files. this data obtained from johns
hopkins university (of course these are public data
and is allowed to use for teaching and non-commercial
purposes).
from elementtree.ElementTree import ElementTree
>>> mydata = ElementTree(file='00004.xml')
>>> for process in
mydata.findall('//biological_process'):
print process.text
>>> for proc in mydata.findall('functions'):
print proc
>>>
I do not understand why I am unable to parse this
file. I questioned if this file is not well structures
(well formedness). I feel it is properly structured
and yet it us unparsable.
Would you please help me /guide me what the problem
is. Apologies if i am completely ignoring somethings.
PS: Attached is the XML file that I am using.
--- Kent Johnson <kent37 at tds.net> wrote:
> ps python wrote:
> > Kent and Dany,
> > Thanks for your replies.
> >
> > Here fromstring() assuming that the input is in a
> kind
> > of text format.
>
> Right, that is for the sake of a simple example.
> >
> > what should be the case when I am reading files
> > directly.
> >
> > I am using the following :
> >
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='00001.xml')
> > iter = root.getiterator()
> >
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method.
>
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
> print process.text
>
> The path //biological_process means find any
> biological_process element
> at any depth from the root element.
>
> Kent
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
Send instant messages to your online friends http://in.messenger.yahoo.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00004.xml
Type: text/xml
Size: 10855 bytes
Desc: 1023413501-00004.xml
Url : http://mail.python.org/pipermail/tutor/attachments/20051220/cfdc0134/00004.bin
More information about the Tutor
mailing list