Parsing XML - Newbie help

rh0dium sklass at pointcircle.com
Sun May 22 19:34:08 EDT 2005


Fredrik Lundh wrote:

> didn't you ask the same question a few days ago?  did you read the
> replies to that post?

Yes I did but the XML was malformed.. Actually it still is but you
helped me figure out a way to correct it - Thanks

Here is what I have so far.  Now I want to find a child of a child ( I
think that's how you state it ?? )  Below is a piece of the XML which I
am trying to parse..  In short I want to figure out all of the memory
in a system.  I can look at  the "size" of all "bank:?"'s and add them
up. I am having trouble getting to the children of the "System Memory"

        inp = open("xml.test1")
        data = inp.read()
        inp.close()
        # strip off bogus XML declaration
        import re
        m = re.match("<\?xml[^>]+>", data)
        if m:
            data = data[m.end():]
        # Apparently ampersands are common in lshw.. Get rid of them..
        data = data.replace('&ersand;', '')
        # wrap notes in container element
        data = "<doc>" + data + "</doc>"

        tree = ET.XML(data)

         for elem in tree.findall(".//node"):
            if elem.get("class") == "memory":
                if elem.findtext("description") == "System Memory":
                    print "Found system memory bank"

Ok so up to here I am ok.  I find ( If you want the full xml let me
know) two blocks of system memory.  It MUST be "System Memory" only.
Now how do I get a list of all of the children "nodes" of this.  They
are named bank:N  ( i.e bank:0, bank:1 etc [see below] ).  For each one
of those there may ( or may not ) have some memory stuck in it.  I can
tell if there is memory because a size is given.  I want to a list of
all of the sizes.  From there I can say you have sum(memory) in
len(memory) banks of total banks.

Here is what I tried - but I was clearly messing up..

                    for mem in elem.findall("./node/node")
                        if elem.get("class") == "memory":
                            print "Entering Memory Class"
                            if elem.findtext("size"):
                                print "Found size",
elem.findtext("size"):

And the XML which goes with that..

       <node id="memory:0" claimed="true" class="memory"
handle="DMI:0027">
          <description>System Memory</description>
          <physid>27</physid>
          <slot>System board or motherboard</slot>
          <node id="bank:0" claimed="true" class="memory"
handle="DMI:002C">
             <description>DIMM DDR Synchronous [empty]</description>
             <vendor>JEDEC ID:</vendor>
             <physid>0</physid>
             <slot>DIMM3B</slot>
          </node>
          <node id="bank:1" claimed="true" class="memory"
handle="DMI:002D">
             <description>DIMM DDR Synchronous [empty]</description>
             <vendor>JEDEC ID:</vendor>
             <physid>1</physid>
             <slot>DIMM3A</slot>
          </node>
          <node id="bank:2" claimed="true" class="memory"
handle="DMI:002E">
             <description>DIMM DDR Synchronous 400 MHz (2.5
ns)</description>
             <product>M3 12L2920BG0-CCC</product>
             <vendor>JEDEC ID:CE 00 00 00 00 00 00 00</vendor>
             <physid>2</physid>
             <serial>96000241</serial>
             <slot>DIMM1B</slot>
             <size units="bytes">1073741824</size>
             <width units="bits">64</width>
             <clock units="Hz">400000000</clock>
          </node>
          <node id="bank:3" claimed="true" class="memory"
handle="DMI:002F">
             <description>DIMM DDR Synchronous 400 MHz (2.5
ns)</description>
             <product>M3 12L2920BG0-CCC</product>
             <vendor>JEDEC ID:CE 00 00 00 00 00 00 00</vendor>
             <physid>3</physid>
             <serial>4A000741</serial>
             <slot>DIMM1A</slot>
             <size units="bytes">1073741824</size>
             <width units="bits">64</width>
             <clock units="Hz">400000000</clock>
          </node>
       </node>
       <node id="memory:1" claimed="true" class="memory"
handle="DMI:0028">
          <description>System Memory</description>
          <physid>28</physid>
          <slot>System board or motherboard</slot>
          <node id="bank:0" claimed="true" class="memory"
handle="DMI:0030">
             <description>DIMM DDR Synchronous [empty]</description>
             <vendor>JEDEC ID:</vendor>
             <physid>0</physid>
             <slot>DIMM4B</slot>
          </node>
          <node id="bank:1" claimed="true" class="memory"
handle="DMI:0031">
             <description>DIMM DDR Synchronous [empty]</description>
             <vendor>JEDEC ID:</vendor>
             <physid>1</physid>
             <slot>DIMM4A</slot>
          </node>
          <node id="bank:2" claimed="true" class="memory"
handle="DMI:0032">
             <description>DIMM DDR Synchronous 400 MHz (2.5
ns)</description>
             <product>M3 12L2920BG0-CCC</product>
             <vendor>JEDEC ID:CE 00 00 00 00 00 00 00</vendor>
             <physid>2</physid>
             <serial>95000041</serial>
             <slot>DIMM2B</slot>
             <size units="bytes">1073741824</size>
             <width units="bits">64</width>
             <clock units="Hz">400000000</clock>
          </node>
          <node id="bank:3" claimed="true" class="memory"
handle="DMI:0033">
             <description>DIMM DDR Synchronous 400 MHz (2.5
ns)</description>
             <product>M3 12L2920BG0-CCC</product>
             <vendor>JEDEC ID:CE 00 00 00 00 00 00 00</vendor>
             <physid>3</physid>
             <serial>58000E41</serial>
             <slot>DIMM2A</slot>
             <size units="bytes">1073741824</size>
             <width units="bits">64</width>
             <clock units="Hz">400000000</clock>
          </node>
       </node>
       <node id="memory:2" class="memory" handle="DMI:0029">
          <description>Flash Memory</description>
          <physid>29</physid>
          <slot>System board or motherboard</slot>
          <capacity units="bytes">1048576</capacity>
          <node id="bank" class="memory" handle="DMI:0035">
             <description>Chip FLASH Non-volatile</description>
             <physid>0</physid>
             <slot>SYSTEM ROM</slot>
             <size units="bytes">1048576</size>
             <width units="bits">4</width>
          </node>
       </node>
       <node id="memory:3" class="memory" handle="">
          <physid>b</physid>
       </node>
       <node id="memory:4" class="memory" handle="">
          <physid>c</physid>
       </node>
       <node id="memory:5" class="memory" handle="PCI:00:00.0">
          <description>Memory controller</description>
          <product>CK804 Memory Controller</product>
          <vendor>nVidia Corporation</vendor>
          <physid>0</physid>
          <businfo>pci at 00:00.0</businfo>
          <version>a3</version>
          <width units="bits">32</width>
          <clock units="Hz">66000000</clock>
          <capabilities>
             <capability id="bus_master" >bus mastering</capability>
             <capability id="cap_list" >PCI capabilities
listing</capability>
          </capabilities>
       </node>


Thanks so much.  PS - XML can be a real PITA when the data you throw at
it is not "correct".  I actually had started working with sgmllib after
I saw a similar thread.  However I ran into the same problem ( child of
child..)

Thanks again.




More information about the Python-list mailing list