splitting an XML file on the basis on basis of XML tags
Stefan Behnel
stefan_ml at behnel.de
Mon Apr 7 01:59:33 EDT 2008
bijeshn at gmail.com schrieb:
> Hi all,
>
> i have an XML file with the following structure::
>
> <r1>
> <r2>-----|
> <r3> |
> <r4> |
> . |
> . | --------------------> constitutes one record.
> . |
> . |
> . |
> </r4> |
> </r3> |
> </r2>----|
> <r2>
> .
> .
> . -----------------------|
> . |
> . |
> . |----------------------> there are n
> records in between....
> . |
> . |
> . |
> . ------------------------|
> .
> .
> </r2>
> <r2>-----|
> <r3> |
> <r4> |
> . |
> . | --------------------> constitutes one record.
> . |
> . |
> . |
> </r4> |
> </r3> |
> </r2>----|
> </r1>
>
>
> Here <r1> is the main root tag of the XML, and <r2>...</r2>
> constitutes one record. What I would like to do is
> to extract everything (xml tags and data) between nth <r2> tag and (n
> +k)th <r2> tag. The extracted data is to be
> written down to a separate file.
What do you mean by "written down to a separate file"? Do you have a specific
format in mind?
In general, you can try this:
>>> from xml.etree import cElementTree as ET
>>> itercontext = ET.iterparse("thefile.xml", events=("start", "end")
>>> event,root = itercontext.next()
>>> for event,element in itercontext:
... if event == "end" and element.tag == "r2":
... print ET.tostring(element) # write record subtree as XML
... root.clear() # one record done, clean up everything
http://effbot.org/zone/element-iterparse.htm
You can also do things like
... print element.findtext("r3/r4")
Read the ElementTree tutorial to learn how to extract your data:
http://effbot.org/zone/element.htm#searching-for-subelements
Stefan
More information about the Python-list
mailing list