[XML-SIG] Ignoring white-space in Dom trees
James King
mail at jameskingexpress.co.uk
Wed Sep 22 21:59:59 CEST 2004
Hi,
I'm parsing XML documents into Dom trees and then trying to manipulate
the XML. In short, I'm having trouble traversing the child-nodes in the
tree due to unwanted white-space text-nodes.
In more detail:
The XML documents that are parsed into the DOM look like this sample
below:
<root>
<chapter>
<page>Lorem ipsum</page>
</chapter>
<chapter>
<page>Lorem ipsum</page>
</chapter>
</root>
I'm using 4Suite's Domlette to parse this XML. The relevant Python
script is below:
###############
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
from Ft.Xml.Lib.Print import PrettyPrint
docUri = Uri.OsPathToUri("doc.xml")
domlette1 = NonvalidatingReader.parseUri(docUri)
nodeList = domlette1.childNodes
#### If I make a copy of the root node and then print it ...
clnd = nodeList[0].cloneNode(1)
print clnd
#### ... I get something like this result:
#### <cElement at 0108DA30: name=u'root', 0 attributes, 5 children>
################
The 5 children include the 3 text nodes that are made up solely by the
white-space characters between the <chapter> elements. I'm only
interested in the chapter elements and I don't want to have to worry
about the haphazard whitespace-only text nodes that may or may not be
there.
My Questsions:
Is there a way exclude these nodes from the Dom; something like an
ignore_whitespace setting for the 4suite Domlette? (something like the
ignoreWhite property for XML objects in Flash Actionscript) Otherwise,
are there other python Doms that ignore these whitespace nodes by
default? Or has anyone got a work-around for this problem?
I may be missing something obvious, I'm very new to python.
Thanks in advance if anyone can help.
James
More information about the XML-SIG
mailing list