[XML-SIG] XPath's reliance on id()

Martijn Faassen faassen@vet.uu.nl
Thu, 14 Mar 2002 23:40:28 +0100


Martin v. Loewis wrote:
> Martijn Faassen <faassen@vet.uu.nl> writes:
[snip]
> > Anyway, perhaps the notion of equality is what we need; in my mind two
> > objects can stand in for the same DOM node but not be the same object;
> > they're equal but not identical.
> 
> Strictly speaking, the DOM spec does not guarantee equality of nodes.
> If anything, it guarantees that identity works.

Okay, I want identity, but I don't want identity of Python objects but
identity of DOM nodes, which can diverge. I think that gives rise to
the confusion.

> > The notion for equality in DOM nodes is actually supported by the 
> > DOM level 3 working draft:
> > 
> > """
> > isSameNode (introduced in DOM Level 3)
> 
> It is the notion of "sameness" that is supported. The Python mapping
> could mandate that == for nodes holds iff isSameNode holds, but it
> currently doesn't.
> 
> Notice that they also have isEqualNode; this is *not* what we want.

Ah, run read that, and that's true; not what we want here.

> > Then again, I just found out they have a compareTreePosition()
> > method added to the Node interface that we could use for sorting
> > purposes, I think..
> 
> Indeed. Then it would be up to the DOM implementation to make that
> happen. This sounds like the cleanest approach to me.

Yes, will have to think about implementing that efficiently..

> > But that is in fact what is needed in this case; I have many different
> > proxy objects which may all map to the same actual DOM node, so they'd
> > have the same __hash__. But perhaps the other implications of __hash__
> > break that. What about supplying a 'key' attribute, anticipating DOM 
> > level 3 vague implications? :)
> 
> If we mandate DOM3 features, I think we should use the feature that
> apparently was explicitly added for XSLT document order:
> compareTreePosition.

Yes, I think I agree. :)

> > I don't think it's reasonable to give those inner nodes the same
> > hash value at all. They're not the same node, and shouldn't hash the
> > same way.
> 
> They are equal nodes (in the sense of isEqualNode), so I see no reason
> why the hashes should be different. If I was to implement a hash of a
> node, I'd use the formula
> 
> def hash(node):
>   res = hash(node.nodeType)+hash(node.nodeName)
>   for c in node.childNodes:
>     res += hash(c)
>   return res
> 
> > I don't see any reason to make two different nodes hash the same way just
> > because they have the same name. 
> 
> They have the same name, the same type, and the same content. They
> really are equal.

Yes, but not the same location, and for me the place of a node is
rather important. But yeah, I can see this definition of equality makes
sense, though I can also see a notion of node equality where position is
part of the equality picture. Still, if the DOM defines equality independent
of position, as it seems to do in level 3, and __hash__ implies the notion of
equality, then ParsedXML's current use of __hash__ is wrong.

And I'll have to think about reasonable implementation of
compareTreePosition(). Any ideas? How do we move ahead with support
for this in XPath and PyXML's DOM implementations? My interest is
primarily in making ParsedXML work well with PyXML's XPath, but I'm willing
to do whatever it takes with PyXML to get there. :)

Regards,

Martijn