xml minidom redundant children??

Diez B. Roggisch deets at nospam.web.de
Thu Mar 1 14:07:09 EST 2007


bkamrani at gmail.com schrieb:
> Great guys:
> 
> As a newbie, I'm trying to simply parse a xml file using minidom, but
> I don't know why I get some extra children(?). I don't know what is
> wrong in xml file, but I've tried different xml files, still same
> problem.
> 
> ******************************************************************************
>            xml file (fileTest) looks like:
> <?xml version="1.0" encoding="ISO-8859-1" ?>
> <afc xmlns="http://python.org/:aaa" xmlns:afc="http://
> python.org/:foo">
> <afc:Bibliography>
>    <File version="2.0.0.0" publicationDate="2007-02-16
> 11:23:06+01:00" />
>    <Revision version="2" />
>    <Application version="02.00.00" />
> </afc:Bibliography>
> </afc>
> ******************************************************************************
>             Python file looks like:
> from xml.dom import minidom
> doc = minidom.parse(fileTest)
> a= doc.documentElement.childNodes
> print a
> print '--------------'
> for item in a:
>     print item.nodeName
> ******************************************************************************
>              And output is:
> [<DOM Text node "\n">, <DOM Element: afc:Bibliography at 12082960>,
> <DOM Text node "\n">]
> --------------
> #text
> afc:Bibliography
> #text
> ******************************************************************************
> 
> My question is why this <DOM Text node "\n"> or  #text has been
> created and how to get rid of them by changing python code? (here I'm
> not interested to change xml file.)
> 
> Have search the forum without finding any solution :-(

You can't get rid of them by itself - xml.minidom can't possibly know if 
whitespace is of any significance for you or not.

There are several ways to deal with this. If you have to stay in 
minidom, just loop over the children and discard all whitespace-only 
text-nodes, before really processing the document.

But the better alternative would be to use a better API for processing 
XML. Use one of the several ElementTree implementations, such as lxml:

http://codespeak.net/lxml/

This will not rid you of the whitespace itself, but represents text 
differently so that you can focus on elements without intespersed 
text-nodes.

Diez



More information about the Python-list mailing list