XML Schema?

Harry George hgg9140 at cola.ca.boeing.com
Wed Feb 14 12:58:53 EST 2001


Thanks for the pointer.  I didn't find LTXML in my initial literature
search.  Given that it exists, I don't see much reason to continue on
my effort.  Maybe the xml-conf site could use the testcase generator.

Another possibility I considered was to do a python binding to the
apache Xerces "C++".  Do you know if anyone has done that?  That would
hook into IBM's significant C++/Java XML-oriented releases.

I'm not a fan of Schema either, but it sure is being hyped to the
local decision makers -- so I need a python treatment.  The whole XML
world has migrated from "It is deliberately simple so all languages can
play" to "Let's complexify it so those pesky GPL guys can't keep up."

Uche Ogbuji <uche at ogbuji.net> writes:

> Harry George wrote:
> > 
> > Anyone have a python XML Schema parser/validator?  I thought I saw
> > comments that it wasn't being done yet as part of xml-sig.  Of course,
> > we don't actually need an XML Schema validator inpython (java or C++
> > renditions would do fine), but there is a social cachet to it, so
> > maybe worth the effort.
> 
> I'm not personally a fan of XML Schemas, but I think this would be a
> very worth-while project.  You'd probably get plenty of help as well.
> 
> > Assuming it is an open task, here is an approach.  Anyone see holes in
> > this, besides it being a humongous task?
> > 
> > 1. Get the specs from OASIS-->W3C.
> > 
> > 2. Get test cases (for schemas and for instances) There are a few
> >    cases at xml-conf, but I think a lot more will be needed.  So I'll
> >    need to generate them, and that suggests a case generator, plus of
> >    course a test driver.  I have the testcase generator and driver
> >    done.
> > 
> > 3. XML Schema is basically a regular expression problem, with nodes as
> >    the "characters".
> 
> Hmm.  I wouldn't go this far.  The most basic parts of the content model
> are so, but the entire data-type system and parts of the content model
> need a different approach than regular grammar.
> 
> >    So we can use classical lexer algorithms:
> >    regexpr --> NFA --> DFA.  The hassles may be at the leaf nodes,
> >    where XML Schema has lots of special cases.  I don't knbow if there
> >    are non-re constraints in the specs, but if so I'd apply them after
> >    the initial pass.
> 
> Interesting approach.
> 
> > 4. Given that state machine, run schemas through the parser until it can
> >    build machines from valid schemas and detect invalid ones.
> > 
> > 5. Given a sound state machine, run instance test cases through the
> >    package until it is passing valid instances and detecting invalid
> >    ones.
> > 
> > 6. This would probably be an iterative enhancement exercise, once the
> >    state machine engine was in place.
> > 
> > I have a lex-workalike I wrote in Modula-2, which I'll use as the
> > start point.  Probably could use a SAX input approach ("next node"
> > instead of "next char"), maybe with 1 lookahead.
> 
> Just to note: LT-XML supposedly has a Python interface and an XSchemas
> validator.  I still think your effort would be worth-while, especially
> given your fresh approach.
> 
> http://www.ltg.ed.ac.uk/software/xml/
> 
> 
> -- 
> Uche Ogbuji
> Personal:   uche at ogbuji.net		http://uche.ogbuji.net
> Work:       uche.ogbuji at fourthought.com	http://Fourthought.com

-- 
Harry George                E-mail:  harry.g.george at boeing.com
The Boeing Company          Renton:  (425) 237-6915
P. O. Box 3707  02-CA       Everett: (425) 266-3868
Seattle, WA 98124-2207      Page:    (425) 631-8803  



More information about the Python-list mailing list