[XML-SIG] dom.minidom getting the text content of a node

Rick Hurst rick.hurst at gmail.com
Thu Dec 9 16:21:07 CET 2004


Hi,

i'm trying to migrate my blogworksXML blog to zope/plone - all the
blog content is stored in XML files and I am trying to walk through it
using minidom and extract relevant info. I can walk through and read
attributes but can't get at text content stored inside nodes which
holds the title and body text.

The node in question looks like this <blogtitle><![CDATA[Plone: remove
member self registration]]></blogtitle>

i'm trying the following (i'm a python newbie BTW):-

from xml.dom.minidom import parse, parseString
dom1 = parse('foo.xml')

for node in dom1.getElementsByTagName("blog"):
    id = node.getAttribute("id")
    print id
    for contentNode in node.getElementsByTagName("text"):
       for titleNode in node.getElementsByTagName("blogtitle"):
          print titleNode.nodeName  #returns "blogtitle"
          print titleNode.nodeType   #returns 1
          #print titleNode.data         #AttributeError: Element
instance has no attribute 'data'
          print titleNode.nodeValue  #returns "None"

is there a way of doing this with minidom or do I need to be using a
different parser? Any advice appreciated!

the xml is as follows:-

<?xml version="1.0" encoding="ISO-8859-1"?>
<baef version="1.0">
<blog id="1087570241010">
<author>
<authorname>rick</authorname>
<authormail></authormail>
</author>
<information>
<commentthread>1087570241010</commentthread>
<timestamp>1087570241</timestamp>
<language>en</language>
<categories>
<category id=""></category>
</categories>
</information>
<text>
<blogtitle>
<![CDATA[Plone: remove member self registration]]></blogtitle>
<blogbody><![CDATA[I wanted to set up a plone site with no (rest of
post removed for clarity)]]></blogbody>
</text>
</blog>
</baef>
-- 
Rick Hurst
http://hypothecate.co.uk


More information about the XML-SIG mailing list