Ignoring XML Namespaces with ElementTree

Stefan Behnel stefan_ml at behnel.de
Thu Dec 3 14:55:17 EST 2009


Pete, 03.12.2009 19:21:
> Is there anyway to configure ElementTree to ignore the XML namespace?
> For the past couple months, I've been using minidom to parse an XML
> file that is generated by a unit within my organization that can't
> stick with a standard. This hasnt been a problem until recently when
> the script was provided a 30MB file that once parsed, increased the
> python memory footprint by 1.0GB and now I'm running into Memory
> Errors. Based on Google searches and testing it looks like ElementTree
> is much more efficient with memory and I'd like to switch,

Make sure you use cElementTree, then that's certainly the right choice to make.


> however I'd
> like to be able to ignore the namespaces. These XML files tend to
> randomly switch the namespace for no reason and ignoring these
> namespaces would help the script adapt to the changes. Any help on
> this would be greatly appreciated. I'm having a hard time finding the
> answer.

ET uses namespace URIs as part of the tag name, so if you want to ignore
namespaces, just strip the leading "{...}" (if any) from the tag and work
with the rest (so-called "local name").


> Additionally, anyone know how ElementTree handle's XML elements that
> include Unicode?

It's an XML parser, so the answer is: without any difficulties.

Stefan



More information about the Python-list mailing list