xml.dom.minidom question

John jdboy at mac.com
Mon May 20 09:06:16 EDT 2002


Hi everyone!

I am writing a little CGI script that pulls information out of an XML 
file and inserts it into an XHTML template. I think I pretty much have 
figured out how to do it, I have only run into one little problem.

The following are snippets of the code I am using:

from xml.dom.minidom import *

dom = parse(file)	# where 'file' is the XML file 
					# containing the information
					# I am processing

NodeContent = getText(dom.getElementsByTagName(element)[0].childNodes)
# where 'element' is the tag-name of the element that I want to get the 
node content from

def getText(nodelist):
	rc = ""
	for node in nodelist:
		if node.nodeType == node.TEXT_NODE:		# I know this is not entirely 
			rc = rc + node.data					# correct for my purposes
	return rc

My problem is that this getText function (more or less copied from from 
http://www.python.org/doc/current/lib/dom-example.html) only returns the 
text nodes. Let's say I have an XML-file like this:

<page>
	<content>
		<p>Lovely SPAM is <a href="http://www.spam.com">here</a>.</p>
		<blockquote>
			I've got two legs.
		</blockquote>
	</content>
	...
</page>

Now, if I passed the childNodes of the elements with the tag name 
"content" to my getText function, it would return nothing (an empty 
string), because there are only element nodes, and not any text nodes. 
Basically, I want getText to return everything in between the "content"-
tags in a string, regardless of whether it is part of a text node or an 
element node. 

I have tried taking care of this problem like this:

def getText(nodelist):
	rc = ""
	for node in nodelist:
		if node.nodeType == node.TEXT_NODE:	
			rc = rc + node.data
		if node.nodeType == node.ELEMENT_NODE:	# added this
			rc = rc + node.data
	return rc

But of course, 'node.data' is not valid for element nodes.

How would I have to change getText to accomplish that?

Thanks in advance,

John



More information about the Python-list mailing list