XML and namespaces

Alan Kennedy alanmk at hotmail.com
Mon Dec 5 12:45:56 EST 2005


[uche.ogbuji at gmail.com]
 > The current, erroneous behavior, which you advocate, is of the same
 > bug.  Minidom is an XML Namespaces aware API.  In XML Namespaces, the
 > namespace URI is *part of* the name.  No question about it.  In Clark
 > notation the element name that is specified in
 >
 > element = document.createElementNS("DAV:", "href")
 >
 > is "{DAV:}href".  In Clark notation the element name of the document
 > element in the created docuent is "href".

I think if we're going to get anywhere in this discussion, we'll have to 
stick to the convention that we are dealing with some specific values. I 
suggest the following

element_local_name = 'href'
element_ns_prefix  = 'DAV'
element_ns_uri     = 'somescheme://someuri'

Therefore, in Clark notation, the qualified name of the element in the 
OPs example is "{somescheme://someuri}href". (Yes, I know that "DAV:" is 
a valid namespace URI. But it's a poor example because it looks like a 
namespace prefix, and may be giving rise to some confusion.)

So, to create a namespaced element, we must specify the namespace uri, 
the namespace prefix and the element local name, like so

qname = "%s:%s" % (element_ns_prefix, element_local_name)
element = document.createElementNS(element_ns_uri, qname)

Now, if we create, as the OP did, an element with a namespace uri but no 
prefix, like so

element = document.createElementNS(element_ns_uri, element_local_name)

that element *cannot* be serialised naively, because the namespace 
prefix has not been declared. Yes, the element is correctly scoped to 
the element_ns_uri namespace, but it cannot be serialised because 
declaration of namespace prefixes is a pre-requisite of the Namespaces 
REC. Relevant quotes from the Namespaces REC are

"""
URI references can contain characters not allowed in names, so cannot be 
used directly as namespace prefixes. Therefore, the namespace prefix 
serves as a proxy for a URI reference. An attribute-based syntax 
described below is used to declare the association of the namespace 
prefix with a URI reference; software which supports this namespace 
proposal must recognize and act on these declarations and prefixes.
"""

and

"""
Namespace Constraint: Prefix Declared
The namespace prefix, unless it is xml or xmlns, must have been declared 
in a namespace declaration attribute in either the start-tag of the 
element where the prefix is used or in an an ancestor element (i.e. an 
element in whose content the prefixed markup occurs).
"""

http://www.w3.org/TR/REC-xml-names/

[uche.ogbuji at gmail.com]
 > So you try the tack of invoking "pythonicness".  Well I have one for
 > ya:
 >
 > "In the face of ambiguity, refuse the temptation to guess."

Precisely: If the user has created a document that is not namespace 
correct, then do not try to guess whether it should be corrected or not: 
simply serialize the dud document. If the user wants a namespace 
well-formed document, then they are responsible for either ensuring that 
the relevant namespaces, prefixes and uris are explicitly declared, or 
for explicitly calling some normalization routine that automagically 
does that for them.

[uche.ogbuji at gmail.com]
 > You re guessing that explicit XMLNS attributes are the only way the
 > user means to express namespace information, even though DOM allows
 > this to be provided through such attributes *or* through namespace
 > properties.  I could easily argue that since these are core properties
 > in the DOM, that DOM should ignore explicit XMLNS attributes and only
 > use namespace properties in determining output namespace.  You are
 > guessing that XMLNS attributes (and only those) represent what the
 > user really means.  I would be arguing the same of namespace
 > properties.

I'm not guessing anything: I'm asserting that with DOM Level 2, the user 
is expected to manage their own namespace prefix declarations.

DOM L2 states that "Namespace validation is not enforced; the DOM 
application is responsible. In particular, since the mapping between 
prefixes and namespace URIs is not enforced, in general, the resulting 
document cannot be serialized naively."

DOM L3 provides the normalizeNamespaces method, which the user should 
have to *explicitly* call in order to make their document namespace 
well-formed if it was not already.

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html

The proposal that minidom should automagically fixup namespace 
declarations and prefixes on output would leave it compliant with 
*neither* DOM L2 or L3.

[uche.ogbuji at gmail.com]
 > The reality is that once the poor user has done:
 >
 > element = document.createElementNS("DAV:", "href")
 >
 > They are following DOM specification that they have created an element
 > in a namespace, and you seem to be arguing that they cannot usefully
 > have completed their work until they also do:
 >
 > element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, None, "DAV:")

Actually no, that statement produces "AttributeError: 'NoneType' object 
has no attribute 'split'". I believe that you're confusing "DAV:" as a 
namespace uri with "DAV" as a namespace prefix.

Code for creating the correct prefix declaration is

prefix_decl = "xmlns:%s" % element_ns_prefix
element.setAttributeNS(xml.dom.XMLNS_NAMESPACE, prefix_decl, element_ns_uri)

 > I'd love to hear how many actual minidom users would agree with you.
 >
 > It's currently a bug.  It needs to be fixed.  However, I have no time
 > for this bewildering fight.  If the consensus is to leave minidom the
 > way it is, I'll just wash my hands of the matter, but I'll be sure to
 > emphasize heavily to users that minidom is broken with respect to
 > Namespaces and serialization, and that they abandon it in favor of
 > third-party tools.

It's not a bug, it doesn't need fixing, minidom is not broken.

Although I am sympathetic to your bewilderment: xml namespaces can be 
overly complex when it comes to the nitty, gritty details.

-- 
alan kennedy
------------------------------------------------------
email alan:              http://xhaus.com/contact/alan



More information about the Python-list mailing list