[Tutor] how to extract text by specifying an element using ElementTree
Srinivas Iyyer
srini_iyyer_bio at yahoo.com
Fri Dec 9 20:32:27 CET 2005
Hi group,
I just have another question in parsin XML files. I
found it very easy to parse XML files with kent and
danny's help.
I realized that all my XML files have '\t' and '\n'
and whitespace. these extra features are making to
extract the text data from the xml files very
difficult. I can make these XML parser work when I
rekove '\n' and '\t' from xml files.
is there a way to get rid of '\n' and '\t' characters
from xml files easily.
thank you very much.
MDan
--- Kent Johnson <kent37 at tds.net> wrote:
> ps python wrote:
> > Kent and Dany,
> > Thanks for your replies.
> >
> > Here fromstring() assuming that the input is in a
> kind
> > of text format.
>
> Right, that is for the sake of a simple example.
> >
> > what should be the case when I am reading files
> > directly.
> >
> > I am using the following :
> >
> > from elementtree.ElementTree import ElementTree
> > mydata = ElementTree(file='00001.xml')
> > iter = root.getiterator()
> >
> > Here the whole XML document is loaded as element
> tree
> > and how should this iter into a format where I can
> > apply findall() method.
>
> Call findall() directly on mydata, e.g.
> for process in
> mydata.findall('//biological_process'):
> print process.text
>
> The path //biological_process means find any
> biological_process element
> at any depth from the root element.
>
> Kent
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Tutor
mailing list