[XML-SIG] Removing insignificant whitespace

Uche Ogbuji uche.ogbuji at fourthought.com
Tue Sep 7 05:42:24 CEST 2004


On Tue, 2004-08-31 at 10:49, Brian Quinlan wrote:
> I'm trying to remove the whitespace-only text nodes in my XML DOM. I've 
> tried two approaches:
> 
> 1. StripXml - generates a an exception:
> 
>    File "mac.py", line 25, in __init__
>      StripXml(self.document)
>    File 
> "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/__init__.py", line 
> 153, in StripXml
>      snit = owner_doc.createNodeIterator(startNode, NodeFilter.SHOW_TEXT,
> AttributeError: Document instance has no attribute 'createNodeIterator'

StripXml only works on 4DOM nodes :-(


> 2. setFeature('whitespace_in_element_content', False) seems to do
>     nothing

What SAX parser?


> My code is here:
> 
> from xml import xpath, dom
> from xml.dom.ext import StripXml
> from xml.dom.xmlbuilder import DOMInputSource, DOMBuilder
> from optparse import OptionParser
> from pprint import pprint
> import os
> 
> b = DOMBuilder()
> b.setFeature('whitespace_in_element_content', False)
> self.document = b.parse(...)
> StripXml(self.document)
> 
> My XML does not include a DTD or any declarations regarding whitespace. 
>   Can anyone offer any advice?

I usually use simple generator code for this sort of thing.  See

http://www.xml.com/pub/a/2003/01/08/py-xml.html

Using domtools from that article, or a more recent version of the
module:

http://cvs.4suite.org/cgi-bin/viewcvs.cgi/Scimitar/domtools.py

You could do something like (untested):

doc.normalize()
ws_only_nodes = domtools.doc_order_iter_filter(
  node, lambda n: n.nodeType == Node.TEXT_NODE and not n.strip()
)
for node in ws_only_nodes:
    node.parentNode.removeChild(node)


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Meet me at XMLOpen Sept 21-23 2004, Cambridge, UK.  http://xmlopen.org

A hands-on introduction to ISO Schematron - http://www-106.ibm.com/developerworks/edu/x-dw-xschematron-i.html
Practical (Python) SAX Notes - http://www.xml.com/pub/a/2004/08/11/py-xml.html
XML circles the globe - http://www.javareport.com/article.asp?id=9797
Element structures for names and addresses - http://www.ibm.com/developerworks/xml/library/x-elemdes.html
Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090
Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/



More information about the XML-SIG mailing list