[XML-SIG] Ignoring white-space in Dom trees

James King mail at jameskingexpress.co.uk
Wed Sep 22 21:59:59 CEST 2004


Hi,

I'm parsing XML documents into Dom trees and then trying to manipulate 
the XML. In short, I'm having trouble traversing the child-nodes in the 
tree due to unwanted white-space text-nodes.

In more detail:
The XML documents that are parsed into the DOM look like this sample 
below:

<root>
	<chapter>
		<page>Lorem ipsum</page>
	</chapter>
	<chapter>
		<page>Lorem ipsum</page>
	</chapter>
</root>
	
I'm using 4Suite's Domlette to parse this XML. The relevant Python 
script is below:

###############

from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
from Ft.Xml.Lib.Print import PrettyPrint

docUri = Uri.OsPathToUri("doc.xml")
domlette1 =  NonvalidatingReader.parseUri(docUri)

nodeList = domlette1.childNodes

#### If I make a copy of the root node and then print it ...
clnd = nodeList[0].cloneNode(1)
print clnd

#### ... I get something like this result:
#### <cElement at 0108DA30: name=u'root', 0 attributes, 5 children>

################

The 5 children include the 3 text nodes that are made up solely by the 
white-space characters between the <chapter> elements. I'm only 
interested in the chapter elements and I don't want to have to worry 
about the haphazard whitespace-only text nodes that may or may not be 
there.

My Questsions:
Is there a way exclude these nodes from the Dom; something like an 
ignore_whitespace setting for the 4suite Domlette? (something like the 
ignoreWhite property for XML objects in Flash Actionscript) Otherwise, 
are there other python Doms that ignore these whitespace nodes by 
default? Or has anyone got a work-around for this problem?

I may be missing something obvious, I'm very new to python.

Thanks in advance if anyone can help.

James

  



More information about the XML-SIG mailing list