xml.dom.minidom question
John
jdboy at mac.com
Mon May 20 09:06:16 EDT 2002
Hi everyone!
I am writing a little CGI script that pulls information out of an XML
file and inserts it into an XHTML template. I think I pretty much have
figured out how to do it, I have only run into one little problem.
The following are snippets of the code I am using:
from xml.dom.minidom import *
dom = parse(file) # where 'file' is the XML file
# containing the information
# I am processing
NodeContent = getText(dom.getElementsByTagName(element)[0].childNodes)
# where 'element' is the tag-name of the element that I want to get the
node content from
def getText(nodelist):
rc = ""
for node in nodelist:
if node.nodeType == node.TEXT_NODE: # I know this is not entirely
rc = rc + node.data # correct for my purposes
return rc
My problem is that this getText function (more or less copied from from
http://www.python.org/doc/current/lib/dom-example.html) only returns the
text nodes. Let's say I have an XML-file like this:
<page>
<content>
<p>Lovely SPAM is <a href="http://www.spam.com">here</a>.</p>
<blockquote>
I've got two legs.
</blockquote>
</content>
...
</page>
Now, if I passed the childNodes of the elements with the tag name
"content" to my getText function, it would return nothing (an empty
string), because there are only element nodes, and not any text nodes.
Basically, I want getText to return everything in between the "content"-
tags in a string, regardless of whether it is part of a text node or an
element node.
I have tried taking care of this problem like this:
def getText(nodelist):
rc = ""
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc = rc + node.data
if node.nodeType == node.ELEMENT_NODE: # added this
rc = rc + node.data
return rc
But of course, 'node.data' is not valid for element nodes.
How would I have to change getText to accomplish that?
Thanks in advance,
John
More information about the Python-list
mailing list