[XML-SIG] SAX Namespaces
Fred L. Drake, Jr.
fdrake@beopen.com
Thu, 6 Jul 2000 16:25:41 -0400 (EDT)
Paul Prescod writes:
> Agreed. I've always said that dictionary-based lookup is important and
> must be provided.
The unexpected part was that you'd *ever* want to iterate over a
list in "normal" applications! Unless the order of the attributes in
the source instance is important, I don't see why.
The more I think about it, the more I think a dict-like approach is
the only useful way.
> The question is not which mode should be available, but which should be
> default.
Agreed.
> I would have suggested you use a wrapper approach rather than forking
> the codebase!
Yes, and we'd have said "Have you ever waited for Grail?" Building
wrappers would have been really bad; Grail was never exactly a speed
demon. ;)
> But in the post-namespace world, it isn't clear what to index upon,
> because it depends on what the application is interested in. A lot will
> care about localname/URI pairs. A lot will care about rawnames. A few
> (e.g. search engines) may want lists of attributes with a particular
> namespace.
Understood; I'm not going to argue that the dictionary syntax is
particularly desirable, since there's no one key type that makes
sense; methods are fine for the interface to sets of attributes. (And
it's not that having sequence behavior is something I see as bad for
whatever that object is.)
I just think that the thing that handles all of this should be the
default.
I described the Grail experience:
> The usage pattern we observed in Grail was that we'd set up default
> values in locals, loop over the attributes list to set up locals, and
> then use the locals while doing whatever we needed to do. It was a
> real pain if we needed to branch on one attribute and then only use
[...]
And Paul said:
> I would encapsulate this behavior in the wrapper class. That's what I
> meant by "lazy indexing." You ask for one attribute and an internal
> dictionary "remembers" where it found it. You ask for another and it
> remembers where it found that. If you only ask by URN/localname pair
> then you don't incur the cost of indexing by qname and if you only ask
> by qname then you don't incur the cost of indexing by pair.
This makes a lot of sense. And it points out why we should have a
really efficient implementation of this; I can imagine a C
implementation that does all the work and maintains all the
appropriate caches, and the parser would use one of them, just like
the Java flavors. A copy() method would be used to make a copy of the
object as needed, and the parser could just call the clear() method at
the start of each start tag.
But I still think this should be the default type for the attributes
set.
> I haven't benchmarked this or any strategy. It depends on the
> application. If you find that attributes are slow in your application,
> you could benchmark and replace the AttributeList class with something
> that is more appropriate for it.
And if there's only one really good C implementation, everyone is
happy with what comes out of the box. It makes a nice "battery" to
include. ;)
> It depends on what you are trying to do. There are vast classes of
> applications that I would expect to use the AttributeList class. I'm
> just trying to allow applications to NOT use it if they don't want it.
Is the point of not using an efficiency issue?
> Here is one way of looking at it. Let's say that there are four popular
> APIs out there:
Ok:
SAX -- efficient version is sufficient
DOM -- all my DOM code requests attributes by name, so
lookup approach works; can be copied to a list on
demand, or the efficient C AttributeList can
provide this internally
Pyxie -- not sure
QP_xml -- exposes a dictionary interface, so something
dict-like should work nicely as long as the
interface & efficiency are right.
> dictionary-structure directly. And even some subset of the 25% may find
> that the dictionary is sub-optimal because it is indexed based on the
> wrong property or properties.
Again, I agree that this is an issue, and using a plain dict is not
the right solution. But methods that do name lookup make a lot of
sense, while a list interface doesn't.
I don't think we really radically disagree; we just need an
AttributeList implementation that meets the performance & sequence
criteria. Nothing that a little time & C code can't fix. ;)
Should I persue that possibility, or am I missing something really
substantial somewhere? (Probably several things, but... related to
this?)
-Fred
--
Fred L. Drake, Jr. <fdrake at beopen.com>
BeOpen PythonLabs Team Member