[XML-SIG] SAX Namespaces

Fred L. Drake, Jr. fdrake@beopen.com
Thu, 6 Jul 2000 16:25:41 -0400 (EDT)


Paul Prescod writes:
 > Agreed. I've always said that dictionary-based lookup is important and
 > must be provided.

  The unexpected part was that you'd *ever* want to iterate over a
list in "normal" applications!  Unless the order of the attributes in
the source instance is important, I don't see why.
  The more I think about it, the more I think a dict-like approach is
the only useful way.

 > The question is not which mode should be available, but which should be
 > default.

  Agreed.

 > I would have suggested you use a wrapper approach rather than forking
 > the codebase!

  Yes, and we'd have said "Have you ever waited for Grail?"  Building
wrappers would have been really bad; Grail was never exactly a speed
demon.  ;)

 > But in the post-namespace world, it isn't clear what to index upon,
 > because it depends on what the application is interested in. A lot will
 > care about localname/URI pairs. A lot will care about rawnames. A few
 > (e.g. search engines) may want lists of attributes with a particular
 > namespace.

  Understood; I'm not going to argue that the dictionary syntax is
particularly desirable, since there's no one key type that makes
sense; methods are fine for the interface to sets of attributes.  (And
it's not that having sequence behavior is something I see as bad for
whatever that object is.)
  I just think that the thing that handles all of this should be the
default.

I described the Grail experience:
 > The usage pattern we observed in Grail was that we'd set up default
 > values in locals, loop over the attributes list to set up locals, and
 > then use the locals while doing whatever we needed to do.  It was a
 > real pain if we needed to branch on one attribute and then only use
[...]

And Paul said:
 > I would encapsulate this behavior in the wrapper class. That's what I
 > meant by "lazy indexing." You ask for one attribute and an internal
 > dictionary "remembers" where it found it. You ask for another and it
 > remembers where it found that. If you only ask by URN/localname pair
 > then you don't incur the cost of indexing by qname and if you only ask
 > by qname then you don't incur the cost of indexing by pair.

  This makes a lot of sense.  And it points out why we should have a
really efficient implementation of this; I can imagine a C
implementation that does all the work and maintains all the
appropriate caches, and the parser would use one of them, just like
the Java flavors.  A copy() method would be used to make a copy of the
object as needed, and the parser could just call the clear() method at
the start of each start tag.
  But I still think this should be the default type for the attributes
set.

 > I haven't benchmarked this or any strategy. It depends on the
 > application. If you find that attributes are slow in your application,
 > you could benchmark and replace the AttributeList class with something
 > that is more appropriate for it.

  And if there's only one really good C implementation, everyone is
happy with what comes out of the box.  It makes a nice "battery" to
include.  ;)

 > It depends on what you are trying to do. There are vast classes of
 > applications that I would expect to use the AttributeList class. I'm
 > just trying to allow applications to NOT use it if they don't want it.

  Is the point of not using an efficiency issue?

 > Here is one way of looking at it. Let's say that there are four popular
 > APIs out there:

  Ok:

        SAX     -- efficient version is sufficient
        DOM     -- all my DOM code requests attributes by name, so
                   lookup approach works; can be copied to a list on
                   demand, or the efficient C AttributeList can
                   provide this internally
        Pyxie   -- not sure
        QP_xml  -- exposes a dictionary interface, so something
                   dict-like should work nicely as long as the
                   interface & efficiency are right.

 > dictionary-structure directly. And even some subset of the 25% may find
 > that the dictionary is sub-optimal because it is indexed based on the
 > wrong property or properties.

  Again, I agree that this is an issue, and using a plain dict is not
the right solution.  But methods that do name lookup make a lot of
sense, while a list interface doesn't.
  I don't think we really radically disagree; we just need an
AttributeList implementation that meets the performance & sequence
criteria.  Nothing that a little time & C code can't fix.  ;)
  Should I persue that possibility, or am I missing something really
substantial somewhere?  (Probably several things, but... related to
this?)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at beopen.com>
BeOpen PythonLabs Team Member