lxml namespace as an attribute

dieter dieter at handshake.de
Thu Aug 16 01:52:23 EDT 2018


Skip Montanaro <skip.montanaro at gmail.com> writes:
> Much of XML makes no sense to me. Namespaces are one thing. If I'm
> parsing a document where namespaces are defined at the top level, then
> adding namespaces=root.nsmap works when calling the xpath method. I
> more-or-less get that.
>
> What I don't understand is how I'm supposed to search for a tag when
> the namespace appears to be defined as an attribute of the tag itself.

You seem to think that you need to take the namespace definitions
from the XML document itself. This is not the case: you can
provide them from whatever soure you want.

The important part of the namespace is the namespace uri; the namespace
prefix is just an abbreviation - its exact value is of no importance;
you can use whatever you want (and there is no need that your choice
is the same as that of the XML document).

"lxml" handles "xmlns" "attributes" differently from "normal" attributes.
"Normal" attributes are accessed via a mapping interface; "xmlns" attributes
via the "nsmap" attribute. I think (but I am not sure) that the "nsmap" 
of an element contains all namespace definitions "active" at the element,
not just those defined on the element itself. Thus, if you are able
to locate an element, you can get its relevant namespace definitions
via its "nsmap" (as you did with "root").


In my typical applications, I know the relevant namespace uris.
I define a namespace dict:

 ns = dict(
   p1=uri1,
   p2=uri2,
   ...
   )

with prefixes "p1", ... of my own choice and pass "ns"
as "namespaces" (e.g. for "xpath").


Note that the XPATH specification does not provide to search
with a local part alone for a namespace qualified element
(even if that qualification comes from the default XML namespace).
Such searches must always use a qualified (i.e. with namespace prefix)
path.




More information about the Python-list mailing list