[Tutor] how to extract text by specifying an element using ElementTree

ps python ps_python3 at yahoo.co.in
Tue Dec 20 18:26:39 CET 2005


Dear Drs. Johnson and Yoo , 
 for the last 1 week I have been working on parsing
the elements from a bunch of XML files following your
suggestions. 

until now I have been unsuccessul.  I have no clue why
i am failing. 

I have ~16K XML files. this data obtained from johns
hopkins university (of course these are public data
and is allowed to use for teaching and non-commercial
purposes). 


from elementtree.ElementTree import ElementTree
>>> mydata = ElementTree(file='00004.xml')
>>> for process in
mydata.findall('//biological_process'):
	print process.text

	
>>> for proc in mydata.findall('functions'):
	print proc

	
>>> 



I do not understand why I am unable to parse this
file. I questioned if this file is not well structures
(well formedness). I feel it is properly structured
and yet it us unparsable.  


Would you please help me /guide me what the problem
is.  Apologies if i am completely ignoring somethings.
 

PS: Attached is the XML file that I am using. 

--- Kent Johnson <kent37 at tds.net> wrote:

> ps python wrote:
> >  Kent and Dany, 
> > Thanks for your replies.  
> > 
> > Here fromstring() assuming that the input is in a
> kind
> > of text format. 
> 
> Right, that is for the sake of a simple example.
> > 
> > what should be the case when I am reading files
> > directly. 
> > 
> > I am using the following :
> > 
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='00001.xml')
> > iter = root.getiterator()
> > 
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method. 
> 
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
>    print process.text
> 
> The path //biological_process means find any
> biological_process element 
> at any depth from the root element.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00004.xml
Type: text/xml
Size: 10855 bytes
Desc: 1023413501-00004.xml
Url : http://mail.python.org/pipermail/tutor/attachments/20051220/cfdc0134/00004.bin


More information about the Tutor mailing list