xml.etree and namespaces -- why?

Robert Latest boblatest at yahoo.com
Wed Oct 19 09:25:52 EDT 2022


Hi all,

For the impatient: Below the longish text is a fully self-contained Python
example that illustrates my problem.

I'm struggling to understand xml.etree's handling of namespaces. I'm trying to
parse an Inkscape document which uses several namespaces. From etree's
documentation:

    If the XML input has namespaces, tags and attributes with prefixes in the
    form prefix:sometag get expanded to {uri}sometag where the prefix is
    replaced by the full URI.

Which means that given an Element e, I cannot directly access its attributes
using e.get() because in order to do that I need to know the URI of the
namespace. So rather than doing this (see example below):

    label = e.get('inkscape:label')

I need to do this:

    label = e.get('{' + uri_inkscape_namespace + '}label')

...which is the method mentioned in etree's docs:

    One way to search and explore this XML example is to manually add the URI
    to every tag or attribute in the xpath of a find() or findall().
    [...]
    A better way to search the namespaced XML example is to create a
    dictionary with your own prefixes and use those in the search functions.

Good idea! Better yet, that dictionary or rather, its reverse, already exists,
because etree has used it to unnecessarily mangle the namespaces in the first
place. The documentation doesn't mention where it can be found, but we can
just use the 'xmlns:' attributes of the <svg> root element to rebuild it. Or
so I thought, until I found out that etree deletes exactly these attributes
before handing the <svg> element to the user.

I'm really stumped here. Apart from the fact that I think XML is bloated shit
anyway and has no place outside HTML, I just don't get the purpose of etree's
way of working:

1) Evaluate 'xmlns:' attributes of the <svg> element
2) Use that info to replace the existing prefixes by {uri}
3) Realizing that using {uri} prefixes is cumbersome, suggest to
   the user to build their own prefix -> uri dictionary
   to undo the effort of doing 1) and 2)
4) ...but witholding exactly the information that existed in the original
   document by deleting the 'xmlns:' attributes from the <svg> tag

Why didn't they leave the whole damn thing alone? Keep <svg> intact and keep
the attribute 'prefix:key' literally as they are. For anyone wanting to use
the {uri} prefixes (why would they) they could have thrown in a helper
function for the prefix->URI translation.

I'm assuming that etree's designers knew what they were doing in order to make
my life easier when dealing with XML. Maybe I'm missing the forest for the
trees. Can anybody enlighten me? Thanks!


#### self-contained example
import xml.etree.ElementTree as ET

def test_svg(xml):
    root = ET.fromstring(xml)
    for e in root.iter():
        print(e.tag) # tags are shown prefixed with {URI}
        if e.tag.endswith('svg'):
# Since namespaces are defined inside the <svg> tag, let's use the info
# from the 'xmlns:' attributes to undo etree's URI prefixing
            print('Element <svg>:')
            for k, v in e.items():
                print('  %s: %s' % (k, v))
# ...but alas: the 'xmlns:' attributes have been deleted by the parser

xml = '''<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->

<svg
   width="210mm"
   height="297mm"
   viewBox="0 0 210 297"
   version="1.1"
   id="svg285"
   inkscape:version="1.2.1 (9c6d41e410, 2022-07-14)"
   sodipodi:docname="test.svg"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:svg="http://www.w3.org/2000/svg">
  <sodipodi:namedview
     id="namedview287"
     pagecolor="#ffffff"
     bordercolor="#000000"
     borderopacity="0.25"
     inkscape:showpageshadow="2"
     inkscape:pageopacity="0.0"
     inkscape:pagecheckerboard="0"
     inkscape:deskcolor="#d1d1d1"
     inkscape:document-units="mm"
     showgrid="false"
     inkscape:zoom="0.2102413"
     inkscape:cx="394.78447"
     inkscape:cy="561.25984"
     inkscape:window-width="1827"
     inkscape:window-height="1177"
     inkscape:window-x="85"
     inkscape:window-y="-8"
     inkscape:window-maximized="1"
     inkscape:current-layer="layer1" />
  <defs
     id="defs282" />
  <g
     inkscape:label="Ebene 1"
     inkscape:groupmode="layer"
     id="layer1">
    <rect
       style="fill:#aaccff;stroke-width:0.264583"
       id="rect289"
       width="61.665253"
       height="54.114403"
       x="33.978813"
       y="94.38559" />
  </g>
</svg>
'''

if __name__ == '__main__':
    test_svg(xml)


More information about the Python-list mailing list