xml.etree and namespaces -- why?

Jon Ribbens jon+usenet at unequivocal.eu
Wed Oct 19 12:15:24 EDT 2022


On 2022-10-19, Robert Latest <boblatest at yahoo.com> wrote:
>     If the XML input has namespaces, tags and attributes with prefixes
>     in the form prefix:sometag get expanded to {uri}sometag where the
>     prefix is replaced by the full URI.
>
> Which means that given an Element e, I cannot directly access its attributes
> using e.get() because in order to do that I need to know the URI of the
> namespace.

That's because you *always* need to know the URI of the namespace,
because that's its only meaningful identifier. If you assume that a
particular namespace always uses the same prefix then your code will be
completely broken. The following two pieces of XML should be understood
identically:

    <svg xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape">
      <g inkscape:label="Ebene 1" inkscape:groupmode="layer" id="layer1">

and:

    <svg xmlns:epacskni="http://www.inkscape.org/namespaces/inkscape">
      <g epacskni:label="Ebene 1" epacskni:groupmode="layer" id="layer1">

So you can see why e.get('inkscape:label') cannot possibly work, and why
e.get('{http://www.inkscape.org/namespaces/inkscape}label') makes sense.

The xml.etree author obviously knew that this was cumbersome, and
hence you can do something like:

    namespaces = {'inkspace': 'http://www.inkscape.org/namespaces/inkscape'}
    element = root.find('inkspace:foo', namespaces)

which will work for both of the above pieces of XML.

But unfortunately as far as I can see nobody's thought about doing the
same for attributes rather than tags.


More information about the Python-list mailing list