[Tutor] how to extract text by specifying an element using ElementTree

Fri Dec 9 20:32:27 CET 2005

Hi group,
  I just have another question in parsin XML files. I
found it very easy to parse XML files with kent and
danny's help. 

I realized that all my XML files have '\t' and '\n'
and whitespace.  these extra features are making to
extract the text data from the xml files very
difficult.  I can make these XML parser work when I
rekove '\n' and '\t' from xml files. 

is there a way to get rid of '\n' and '\t' characters
from xml files easily. 
thank you very much.
MDan

--- Kent Johnson <kent37 at tds.net> wrote:

> ps python wrote:
> >  Kent and Dany, 
> > Thanks for your replies.  
> > 
> > Here fromstring() assuming that the input is in a
> kind
> > of text format. 
> 
> Right, that is for the sake of a simple example.
> > 
> > what should be the case when I am reading files
> > directly. 
> > 
> > I am using the following :
> > 
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='00001.xml')
> > iter = root.getiterator()
> > 
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method. 
> 
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
>    print process.text
> 
> The path //biological_process means find any
> biological_process element 
> at any depth from the root element.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com