[XML-SIG] Learning to use elementtree

Doran, Harold HDoran at air.org
Tue Apr 8 15:48:05 CEST 2008


Thanks. I'm piecing this together slowly, but I did get the following to
work.

Test.py
from xml.etree.ElementTree import ElementTree as ET
f = open('test.txt', 'w')
et = ET(file='out_g4r_b.xml')
for statentityref in
et.findall('admin/responseanalyses/analysis/analysisdata/statentityref')
:
   print >> f, statentityref.attrib['id']    
   for statentityref in statentityref.findall('statentityref'):
      for statval in statentityref.findall('statval'): 
         print >> f, statentityref.attrib['id'], '\t',
statval.attrib['type'], '\t', statval.attrib['value']
f.close() 

And this gives output like:

13963
0.000000 	UncollapsedMeanScore 	23.863636
0.000000 	ScorePtPct 	0.018333
0.000000 	ScorePtBiserial 	-0.496309
0.000000 	ScorePtAdjBiserial 	-0.452588
1.000000 	UncollapsedMeanScore 	34.941426
1.000000 	ScorePtPct 	0.981667
1.000000 	ScorePtBiserial 	0.496309
1.000000 	ScorePtAdjBiserial 	0.452588
omit 	ScorePtPct 	0.000000
omit 	ScorePtBiserial 	-99999.990000
omit 	ScorePtAdjBiserial 	-99999.990000
13962
0.000000 	UncollapsedMeanScore 	29.305195
0.000000 	ScorePtPct 	0.256667
0.000000 	ScorePtBiserial 	-0.484469
0.000000 	ScorePtAdjBiserial 	-0.425165
1.000000 	UncollapsedMeanScore 	36.614350
1.000000 	ScorePtPct 	0.743333
1.000000 	ScorePtBiserial 	0.484469
1.000000 	ScorePtAdjBiserial 	0.425165
omit 	ScorePtPct 	0.000000
omit 	ScorePtBiserial 	-99999.990000
omit 	ScorePtAdjBiserial 	-99999.990000

...

This is almost exactly what I want, and can live with this if needed.
What would be most convenient, however, is to format the ouput as
follows:

13963	0.000000 	UncollapsedMeanScore 	23.863636
13963	0.000000 	ScorePtPct 	0.018333
13963	0.000000 	ScorePtBiserial 	-0.496309
13963	0.000000 	ScorePtAdjBiserial 	-0.452588
13963	1.000000 	UncollapsedMeanScore 	34.941426
13963	1.000000 	ScorePtPct 	0.981667
13963	1.000000 	ScorePtBiserial 	0.496309
13963	1.000000 	ScorePtAdjBiserial 	0.452588

I think this may be what Cliff meant by name collusion. That is, the
number 13963 comes from an attribute ['id'] in statentityref. But also,
0.000 and 1.0 are also from the id attribute in statentityref nested in
statentityref. So, I'm a bit confused as to how to go about printing
them out side by side.


> -----Original Message-----
> From: Stefan Behnel [mailto:stefan_ml at behnel.de] 
> Sent: Monday, April 07, 2008 8:32 AM
> To: Doran, Harold
> Cc: J. Cliff Dyer; xml-sig at python.org
> Subject: Re: [XML-SIG] Learning to use elementtree
> 
> Hi,
> 
> Doran, Harold wrote:
> > Well, I think I'm getting close. But, I think this is 
> similar to the 
> > problem I had when I started. This seems to create a huge data file 
> > with all information under the first item, and then again all 
> > information under the second item and so forth.
> > 
> > for statentityref in \
> > 
> et.findall('admin/responseanalyses/analysis/analysisdata/state
> ntityref')
> > :   
> >    print >> f, statentityref.attrib['id']
> >    for statentityref in \
> >  
> > 
> et.findall('admin/responseanalyses/analysis/analysisdata/state
> ntityref/s
> > tatentityref'):   
> >       for statval in statentityref.findall('statval'):
> >          print >> f, statentityref.attrib['id'], '\t', 
> > statval.attrib['type'], '\t', statval.attrib['value']
> 
> I think you should read the previous post again. You are 
> nesting three loops here where two would do what you want.
> 
> Stefan
> 
> 
> >> -----Original Message-----
> >> From: J. Cliff Dyer [mailto:jcd at unc.edu]
> >> Sent: Wednesday, April 02, 2008 3:36 PM
> >> To: Doran, Harold
> >> Cc: xml-sig at python.org
> >> Subject: Re: [XML-SIG] Learning to use elementtree
> >>
> >> On Wed, 2008-04-02 at 15:28 -0400, Doran, Harold wrote:
> >>> Indeed, navigating the xml is tough (for me). I have been
> >> able to get
> >>> the following to work. I put in "Sub Element" to indicate the new 
> >>> section of data. But, from looking at the text output, 
> one doesn't 
> >>> know which item these sub elements belong to. I think the
> >> solution is
> >>> to create an index like 13965-0 to show that this is the 
> >>> subinformation from the item above it. That seems to be
> >> where I am getting stuck.
> >>> Although, I am open to other suggestions on how to best
> >> represent the
> >>> output.
> >>>
> >>> from xml.etree.ElementTree import ElementTree as ET
> >>>
> >>> filename = raw_input("Please enter the AM XML file: ") new_file = 
> >>> raw_input("Save this file as: ")
> >>>
> >>> # create a new file defined by the user f = open(new_file, 'w')
> >>>
> >>> et = ET(file=filename)
> >>>
> >>> for statentityref in \
> >>>
> >> 
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityre
> >> f
> >>> ')
> >>> :
> >>>     for statval in statentityref.findall('statval'):
> >>>       print >> f, statentityref.attrib['id'], '\t', 
> >>> statval.attrib['type'], '\t', statval.attrib['value']
> >>>
> >>> f.write("\n\n")
> >>> f.write("Sub Element\n\n")
> >>>
> >>> for statentityref in \
> >>>
> >> 
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityre
> >> f
> >>> /s
> >>> tatentityref'):
> >>>     for statval in statentityref.findall('statval'):
> >>>       print >> f, statentityref.attrib['id'], '\t', 
> >>> statval.attrib['type'], '\t', statval.attrib['value']
> >>> f.close()
> >> Do you want your second statentityref loop to be based on 
> its parent 
> >> statentityref?  If so, you need to nest it in the original 
> loop, and 
> >> use an xpath relative to your outer statentityref (and 
> watch for name 
> >> collisions).
> 
> 


More information about the XML-SIG mailing list