[XML-SIG] New Reader Architecture

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Mon, 06 Nov 2000 13:46:16 -0700


> > We have rewitten most of the code used for creating text from DOMs.
> > I've cc'ed xml-sig because the check-ins of 4DOM I'll be making
> > today reflect these changes.
> 
> Very interesting. Are you following the DOM Level 3 discussions on
> load-and-save interfaces? [I couldn't access the draft right now, so
> I can't check whether it is related to your work]

Not yet.  In the first draft, load and save was not covered at all.  I haven't 
perused the second draft, but at any rate it will be somewhat closer to CR 
before we move DOM L3.  We were burned in terms of wasted effort by moving to 
the draft DOM L2 namespaces and having them change quite a bit.

> > Using one of the new reader classes is also simple.  You create an
> > instance passing in to the constructor any parameters relevant to the
> > state of that class.
> 
> While support for customization is a good thing, I think many users
> won't need it, or might get confused by it. So I'd prefer to have some
> guidelines what the "good for most uses" way of getting a DOM is.

OK.  I'll try to get some such doc in before release.

> > Once you have the reader instance, you use the fromStream or fromUri
> > method to create each DOM.  The equivalents to the other common utility
> > reader functions (say fromString or fromFile) have been eliminated for
> > simplicity since it is trivial to turn text or a filename into a
> > stream.
> 
> Can you please bring the fromString interface back? In interactive
> mode, it is a pain to type StringIO.StringIO.

OK.

> Also, what is the complication that makes urllib not work for fromUri?
> In the Python 2 SAX2 interfaces, you can pass a string to parse, and
> it will then consider that as a system identifier. In turn, it will
> pass it to urllib, which will open either a local file or the URL.

Ah, but not all URIs are URLs.  What if you have a URN resolution handler?  
This is something that will be especially relevant with 4Suite Server, which 
provides URN/UUIDs for XML documents in the repositories, and also provides a 
relevant URI handler which can easily be plugged into XPath, XSLT, RDF, 
XPointer, etc.

> > [Note that the Domlette readers also have an argument to fromStream,
> > stripElements, for specifying elements from which white-space is to be
> > stripped while building the DOM.  This is merely to support some
> > internal XSLT optimizations until a better way can be found.  Using
> > these arguments is deprecated and they may be removed from the method
> > signatures in any future 4Suite release.]
> 
> Isn't a validating parser supposed to indicate which elements can have
> their whitespace stripped?

Not directly, but of course one can use the ignorableWhitespace call-back if 
you're using SAX.

However, the reader support for stripping is an entirely different matter 
entirely.  XSLT allows you to specify elements to be stripped from source 
documents.  Originally, 4XSLT would create the DOM normally, and then strip 
the relevant WS nodes, but this was horribly inefficient.  We sped things up 
several times by merely stripping whitespace as we built the DOM.  This is why 
we have the interface, and it is also why it is not recommended for regular 
use: it's pretty much a hack (but a very important hack) for XSLT performance.

> > Python 1.x users can break circular dependencies by calling the
> > releaseNode method on the reader that was used to create the DOM:
> > 
> > reader.releaseNode(xml_doc)
> 
> What kind of circularity does that break? The one in the tree? Does
> that mean I have to keep the reader until I release the tree?

Yes and maybe.  You don't have to keep the reader around if you're sure what 
type of DOM you have.  However, if you try to call cDomlette's ReleaseNode on 
a pDomlette node, it will break, and vice versa.  That's why it's also on the 
instance as a convenience.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python