[XML-SIG] Removing insignificant whitespace

Fred L. Drake, Jr. fdrake at acm.org
Wed Sep 1 15:26:53 CEST 2004


On Wednesday 01 September 2004 05:30 am, Brian Quinlan wrote:
 > Yes, but whitespace-only nodes are very common in XML formatted for
 > human consumption e.g.
...
 > I count 3 whitespace-only nodes (even after normalize). Those nodes are
 > not useful to the application some I'm wondering about the canonical
 > way of removing them (without writing the [admittedly simple] code

Here are some approaches that can be applied generally; your application may 
be able to use something more specific.

- Don't remove them, just ignore them.  How easy this is depends on how you 
application processes the DOM.  getElememtsByTagName() (and the 
namespace-aware varient) may help here.

- Use a DTD so the parser can determine which whitespace exists in element 
content so it can avoid adding them to the tree, and your initial example 
shows you tried.  This *requires* a DTD.

- Use a node filter that discards Text nodes in element content.  This 
requires that your filter knows enough about the document type you're 
expecting that it can identify whitespace in element content.

There are probably other approaches as well.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>



More information about the XML-SIG mailing list