[XML-SIG] Re: saxlib 1.0beta

Lars Marius Garshol larsga@step.de
Fri, 08 May 1998 10:33:21 +0200


Andrew Kuchling wrote:
> 
>         This reminds me of something else.  DOM is going to need to
> use the various drv_ files as well in order to support various
> parsers.  It seems redundant to write code for making SAX work with
> Expat, and then have to write code for DOM via Expat. 

I sat down to write a DOM builder for xmlproc, but discovered there was
no point. Stephane has been really smart about this: the base builder
class uses a SAX interface so SAX drivers are automatically usable as 
DOM builders by using sax_builder as the SAX document handler. 

(The SAX builder converts character data from being a piece of a buffer 
to a single string, which must be done anyway.)

> (DOM can't just sit on top of SAX because level 1 SAX doesn't provide 
> an interface for comments along, so you lose comments when you go 
> through SAX.  This is bad for DOM-using applications that modify XML 
> documents.) 

Well, we don't preserve entity information either, or whitespace in
tags, so writing an XML editor on top of this is probably not going to
work anyway. In fact, all XML parsers I know of today throws away lots
of
lexical information. I've thought about adding this to xmlproc, but
have so far refrained from it, since it would mean slowing it down and 
it would be a lot of work both to implement and to get the interfaces 
right.

So I'm not sure it's worth the extra bother just to get comments into
the DOM tree. And if we do decide we should have comments, then I think
having non-standard SAX drivers is the way to go. Anyone else have an 
opinion on this? Or a use for comments in the DOM? 

Personally I think we should leave them out, for exactly the same 
reasons they were left out of SAX, unless someone can think of a 
convincing argument why editor-like applications can work without 
sufficient lexical information.

> Sharing drivers would also let both SAX and DOM use an ESIS driver, or
> anything else that gets written.

That we can do already, since ESIS does not contain comments. (Although
one can pick out entity boundaries from an ESIS stream, although not
from
the one generated by nsgmls, if I remember correctly.)

> Therefore, should the xml.sax.drivers package be moved up a
> level, to xml.drivers?

If we think people will use the DOM without SAX I think we should do
that,
yes.

One other thing is that I think it should be a little easier to make
SAX and the DOM work together. Ideally there should be a function that
let you say

     make_dom("mydoc.xml") # would be really cool if it worked for 
                           # .sgml? and .html? as well :-)

and gave you back a DOM Document object. (Behind the scenes the 
SAX ParserFactory should of course be used to get a parser driver.)

Another thing I've been thinking of is to add some methods like

    get_parser_name()  # xmllib, xmlproc, pyexpat or XML-Toolkit
    is_validating()    # Only drv_xmlproc_val so far
    reads_dtd()        # Will have to be defined carefully
    is_fast()          # Only pyexpat returns true here

to the SAX drivers. This would make the ParserFactory much more powerful
and would be nice for other things as well. (Of course, only pyexpat
would
answer true to the last method.) Anyone against it, or who would prefer
something different?

--Lars M.