[XML-SIG] Removing insignificant whitespace
Uche Ogbuji
uche.ogbuji at fourthought.com
Tue Sep 7 05:42:24 CEST 2004
On Tue, 2004-08-31 at 10:49, Brian Quinlan wrote:
> I'm trying to remove the whitespace-only text nodes in my XML DOM. I've
> tried two approaches:
>
> 1. StripXml - generates a an exception:
>
> File "mac.py", line 25, in __init__
> StripXml(self.document)
> File
> "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/__init__.py", line
> 153, in StripXml
> snit = owner_doc.createNodeIterator(startNode, NodeFilter.SHOW_TEXT,
> AttributeError: Document instance has no attribute 'createNodeIterator'
StripXml only works on 4DOM nodes :-(
> 2. setFeature('whitespace_in_element_content', False) seems to do
> nothing
What SAX parser?
> My code is here:
>
> from xml import xpath, dom
> from xml.dom.ext import StripXml
> from xml.dom.xmlbuilder import DOMInputSource, DOMBuilder
> from optparse import OptionParser
> from pprint import pprint
> import os
>
> b = DOMBuilder()
> b.setFeature('whitespace_in_element_content', False)
> self.document = b.parse(...)
> StripXml(self.document)
>
> My XML does not include a DTD or any declarations regarding whitespace.
> Can anyone offer any advice?
I usually use simple generator code for this sort of thing. See
http://www.xml.com/pub/a/2003/01/08/py-xml.html
Using domtools from that article, or a more recent version of the
module:
http://cvs.4suite.org/cgi-bin/viewcvs.cgi/Scimitar/domtools.py
You could do something like (untested):
doc.normalize()
ws_only_nodes = domtools.doc_order_iter_filter(
node, lambda n: n.nodeType == Node.TEXT_NODE and not n.strip()
)
for node in ws_only_nodes:
node.parentNode.removeChild(node)
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Meet me at XMLOpen Sept 21-23 2004, Cambridge, UK. http://xmlopen.org
A hands-on introduction to ISO Schematron - http://www-106.ibm.com/developerworks/edu/x-dw-xschematron-i.html
Practical (Python) SAX Notes - http://www.xml.com/pub/a/2004/08/11/py-xml.html
XML circles the globe - http://www.javareport.com/article.asp?id=9797
Element structures for names and addresses - http://www.ibm.com/developerworks/xml/library/x-elemdes.html
Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090
Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/
More information about the XML-SIG
mailing list