XML Schema?

Harry George hgg9140 at seanet.com
Tue Feb 13 22:21:41 EST 2001


Anyone have a python XML Schema parser/validator?  I thought I saw
comments that it wasn't being done yet as part of xml-sig.  Of course,
we don't actually need an XML Schema validator inpython (java or C++
renditions would do fine), but there is a social cachet to it, so
maybe worth the effort.

Assuming it is an open task, here is an approach.  Anyone see holes in
this, besides it being a humongous task?

1. Get the specs from OASIS-->W3C.

2. Get test cases (for schemas and for instances) There are a few
   cases at xml-conf, but I think a lot more will be needed.  So I'll
   need to generate them, and that suggests a case generator, plus of
   course a test driver.  I have the testcase generator and driver
   done.

3. XML Schema is basically a regular expression problem, with nodes as
   the "characters".  So we can use classical lexer algorithms:
   regexpr --> NFA --> DFA.  The hassles may be at the leaf nodes,
   where XML Schema has lots of special cases.  I don't knbow if there
   are non-re constraints in the specs, but if so I'd apply them after
   the initial pass.

4. Given that state machine, run schemas through the parser until it can
   build machines from valid schemas and detect invalid ones.

5. Given a sound state machine, run instance test cases through the
   package until it is passing valid instances and detecting invalid
   ones.

6. This would probably be an iterative enhancement exercise, once the
   state machine engine was in place.

I have a lex-workalike I wrote in Modula-2, which I'll use as the
start point.  Probably could use a SAX input approach ("next node"
instead of "next char"), maybe with 1 lookahead.



-- 
Harry George
hgg9140 at seanet.com



More information about the Python-list mailing list