help with recursive whitespace filter in

MRAB google at mrabarnett.plus.com
Sun May 10 13:35:59 EDT 2009


rustom wrote:
> On May 10, 9:49 pm, Steve Howell <showel... at yahoo.com> wrote:
>> On May 10, 9:10 am, Rustom Mody <rustompm... at gmail.com> wrote:
>>
>>
>>
>>> I am trying to write a recursive filter to remove whitespace-only
>>> nodes for minidom.
>>> The code is below.
>>> Strangely it deletes some whitespace nodes and leaves some.
>>> If I keep calling it -- like so: fws(fws(fws(doc)))  then at some
>>> stage all the ws nodes disappear
>>> Does anybody have a clue?
>>> from xml.dom.minidom import parse
>>> #The input to fws is the output of parse("something.xml")
>>> def fws(ele):
>>>     """ filter white space (recursive)"""
>>>    for c in ele.childNodes:
>>>         if isWsNode(c):
>>>             ele.removeChild(c)
>>>             #c.unlink() Makes no diff whether this is there or not
>>>         elif c.nodeType == ele.ELEMENT_NODE:
>>>             fws(c)
>>> def isWsNode(ele):
>>>     return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())
>> I would avoid doing things like delete/remove in a loop.  Instead
>> build a list of things to delete.
> 
> Yeah I know. I would write the whole damn thing functionally if I knew
> how.  But cant figure out the API.
> I actually started out to write a (haskell-style) copy out the whole
> tree minus the unwanted nodes but could not figure it out
> 
def fws(ele):
     """ filter white space (recursive)"""
     empty_nodes = []
     for c in ele.childNodes:
         if isWsNode(c):
             empty_nodes.append(c)
         elif c.nodeType == ele.ELEMENT_NODE:
             fws(c)
     for c in empty_nodes:
         ele.removeChild(c)



More information about the Python-list mailing list