XML Schema?

Wed Feb 14 09:00:40 EST 2001

Harry George wrote:
> 
> Anyone have a python XML Schema parser/validator?  I thought I saw
> comments that it wasn't being done yet as part of xml-sig.  Of course,
> we don't actually need an XML Schema validator inpython (java or C++
> renditions would do fine), but there is a social cachet to it, so
> maybe worth the effort.

I'm not personally a fan of XML Schemas, but I think this would be a
very worth-while project.  You'd probably get plenty of help as well.

> Assuming it is an open task, here is an approach.  Anyone see holes in
> this, besides it being a humongous task?
> 
> 1. Get the specs from OASIS-->W3C.
> 
> 2. Get test cases (for schemas and for instances) There are a few
>    cases at xml-conf, but I think a lot more will be needed.  So I'll
>    need to generate them, and that suggests a case generator, plus of
>    course a test driver.  I have the testcase generator and driver
>    done.
> 
> 3. XML Schema is basically a regular expression problem, with nodes as
>    the "characters".

Hmm.  I wouldn't go this far.  The most basic parts of the content model
are so, but the entire data-type system and parts of the content model
need a different approach than regular grammar.

>    So we can use classical lexer algorithms:
>    regexpr --> NFA --> DFA.  The hassles may be at the leaf nodes,
>    where XML Schema has lots of special cases.  I don't knbow if there
>    are non-re constraints in the specs, but if so I'd apply them after
>    the initial pass.

Interesting approach.

> 4. Given that state machine, run schemas through the parser until it can
>    build machines from valid schemas and detect invalid ones.
> 
> 5. Given a sound state machine, run instance test cases through the
>    package until it is passing valid instances and detecting invalid
>    ones.
> 
> 6. This would probably be an iterative enhancement exercise, once the
>    state machine engine was in place.
> 
> I have a lex-workalike I wrote in Modula-2, which I'll use as the
> start point.  Probably could use a SAX input approach ("next node"
> instead of "next char"), maybe with 1 lookahead.

Just to note: LT-XML supposedly has a Python interface and an XSchemas
validator.  I still think your effort would be worth-while, especially
given your fresh approach.

http://www.ltg.ed.ac.uk/software/xml/

-- 
Uche Ogbuji
Personal:   uche at ogbuji.net		http://uche.ogbuji.net
Work:       uche.ogbuji at fourthought.com	http://Fourthought.com