[XML-SIG] Questions about lxml/ElementTree

Fredrik Lundh fredrik at pythonware.com
Sat Jun 9 21:28:58 CEST 2007


Dave Kuhlman wrote:

> 1. When I get the tag for a node (element) using node.tag, I see
>    something like this::
> 
>        {http://xxxx.com/ns/yyyy}zzzz
> 
>    The stuff inside curly brackets is the namespace.  I don't need
>    that, so I use a regular expression to strip it off.
> 
>    My question is -- Is there a way to get the tag (element name)
>    without a namespace.  I'll feel silly at some time in the
>    future after writing lots of code that strips the namespace
>    if I find that there is an easier way.

you'll probably feel even sillier when someone adds an element with the 
same tag but in a different namespace to the data you're dealing with, 
and your program breaks in a really strange way ;-)

first, the namespace *is* part of the element name.  you should only 
ignore it if you know exactly what you're doing.  ("don't know what it's 
good for" isn't a valid reason ;-)

if you decide that you want to ignore a specific namespace, be explicit.
sometimes, you can define one or more safe-to-ignore namespaces in your 
program, and check for them.  otherwise, you might have to inspect some 
container element, and get the actual namespace from there (the latter's 
sometimes necessary when dealing with some RSS dialects, for example).
once you've figured out what you can safely ignore, you can clean up all 
the tags using something like:

     tagmap = {}
     for elem in tree.getiterator():
         try:
             elem.tag = tagmap[elem.tag]
         except KeyError:
             ... figure out how to handle elem.tag ...
             elem.tag = tagmap[elem.tag] = new tag

finally, using an RE to do the stripping feels a bit like overkill, 
though: I'm pretty sure tag.split("}")[-1] is more efficient (but I 
haven't benchmarked it in 2.5).

hope this helps!

cheers /F



More information about the XML-SIG mailing list