[XML-SIG] parsing xml schema

Uche Ogbuji uche.ogbuji@fourthought.com
Sat, 08 Dec 2001 09:51:58 -0700


Playing e-mail catch-up again...

> Uche Ogbuji wrote:
> 
> > Don't hesitate to ask if yo need help with this task.  In fact, if you were
> > able to write up what you did to use PyTREX as a validator I would love to
> > make this available to others.
> 
> OK, here is how I see it.
> 
> Basically, I need to do validation of XML data files. These may either be from
> textual xml data that is submitted to the application, or on DOM structures that
> have been retrieved from some storage(content repos, pickled DOM in RDBMS?). The
> DOM structures are very likely to be either pDomlette, or 4Suite 0.12 R/W
> cDomlettes (he said hopefully ;-)

Mutable cDomlette is already in CVS, though we're working on a few lingering 
stability problems.


> Another, perhaps more esoteric, case is where the TREX pattern is stored in a DOM,
> having perhaps been generated from an XSLT transform, although off-hand I can't
> picture any use cases for such a scenario?
> 
> It is very likely to be the case that I will need a persistable "compiled" version
> of the trex pattern, since I will have a set of 10 to 100 handwritten trex
> patterns that will be used continually, and I don't want to parse them each time.
> It is quite likely I could just pickle the pattern after parsing, but that remains
> to be verified.

The trick here, of course, is to pickle to the internal format of the TREX 
processor, but I'm not very familiar with PyTREX guts.


> Validating textual XML is simple. Just create pyTrex instances from the textual
> XML, using the "parse_Instance" function, create a trex instance from the textual
> trex file, using the "parse_Trex" function. And then use the "validate" function
> to match the former against the latter.
> 
> However, it is more complex when it comes to DOMs, mainly because pyTrex uses
> non-SAX/DOM interfaces in order to speed things up as much as possible.
> Efficiently integrating with [cp]Domlette  is non-trivial, for the following
> reasons.
> 
> 1. pyTrex uses the pyExpat (non-SAX) callback interfaces directly, presumably to
> increase speed.

Probably inevitable:  The SAX layer adds an unfortunate amount of overhead 
right now.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Boulder, CO 80301-2537, USA
XML strategy, XML tools (http://4Suite.org), knowledge management