[XML-SIG] Questions about lxml/ElementTree
Fredrik Lundh
fredrik at pythonware.com
Sat Jun 9 21:28:58 CEST 2007
Dave Kuhlman wrote:
> 1. When I get the tag for a node (element) using node.tag, I see
> something like this::
>
> {http://xxxx.com/ns/yyyy}zzzz
>
> The stuff inside curly brackets is the namespace. I don't need
> that, so I use a regular expression to strip it off.
>
> My question is -- Is there a way to get the tag (element name)
> without a namespace. I'll feel silly at some time in the
> future after writing lots of code that strips the namespace
> if I find that there is an easier way.
you'll probably feel even sillier when someone adds an element with the
same tag but in a different namespace to the data you're dealing with,
and your program breaks in a really strange way ;-)
first, the namespace *is* part of the element name. you should only
ignore it if you know exactly what you're doing. ("don't know what it's
good for" isn't a valid reason ;-)
if you decide that you want to ignore a specific namespace, be explicit.
sometimes, you can define one or more safe-to-ignore namespaces in your
program, and check for them. otherwise, you might have to inspect some
container element, and get the actual namespace from there (the latter's
sometimes necessary when dealing with some RSS dialects, for example).
once you've figured out what you can safely ignore, you can clean up all
the tags using something like:
tagmap = {}
for elem in tree.getiterator():
try:
elem.tag = tagmap[elem.tag]
except KeyError:
... figure out how to handle elem.tag ...
elem.tag = tagmap[elem.tag] = new tag
finally, using an RE to do the stripping feels a bit like overkill,
though: I'm pretty sure tag.split("}")[-1] is more efficient (but I
haven't benchmarked it in 2.5).
hope this helps!
cheers /F
More information about the XML-SIG
mailing list