XML Schema?
Romuald Texier
rtexier at elikya.com
Thu Feb 15 06:24:38 EST 2001
Did you take a look at http://4suite.org/ ?
Regards.
Romuald Texier.
Harry George wrote:
> Thanks for the pointer. I didn't find LTXML in my initial literature
> search. Given that it exists, I don't see much reason to continue on
> my effort. Maybe the xml-conf site could use the testcase generator.
>
> Another possibility I considered was to do a python binding to the
> apache Xerces "C++". Do you know if anyone has done that? That would
> hook into IBM's significant C++/Java XML-oriented releases.
>
> I'm not a fan of Schema either, but it sure is being hyped to the
> local decision makers -- so I need a python treatment. The whole XML
> world has migrated from "It is deliberately simple so all languages can
> play" to "Let's complexify it so those pesky GPL guys can't keep up."
>
> Uche Ogbuji <uche at ogbuji.net> writes:
>
> > Harry George wrote:
> > >
> > > Anyone have a python XML Schema parser/validator? I thought I saw
> > > comments that it wasn't being done yet as part of xml-sig. Of course,
> > > we don't actually need an XML Schema validator inpython (java or C++
> > > renditions would do fine), but there is a social cachet to it, so
> > > maybe worth the effort.
> >
> > I'm not personally a fan of XML Schemas, but I think this would be a
> > very worth-while project. You'd probably get plenty of help as well.
> >
> > > Assuming it is an open task, here is an approach. Anyone see holes in
> > > this, besides it being a humongous task?
> > >
> > > 1. Get the specs from OASIS-->W3C.
> > >
> > > 2. Get test cases (for schemas and for instances) There are a few
> > > cases at xml-conf, but I think a lot more will be needed. So I'll
> > > need to generate them, and that suggests a case generator, plus of
> > > course a test driver. I have the testcase generator and driver
> > > done.
> > >
> > > 3. XML Schema is basically a regular expression problem, with nodes as
> > > the "characters".
> >
> > Hmm. I wouldn't go this far. The most basic parts of the content model
> > are so, but the entire data-type system and parts of the content model
> > need a different approach than regular grammar.
> >
> > > So we can use classical lexer algorithms:
> > > regexpr --> NFA --> DFA. The hassles may be at the leaf nodes,
> > > where XML Schema has lots of special cases. I don't knbow if there
> > > are non-re constraints in the specs, but if so I'd apply them after
> > > the initial pass.
> >
> > Interesting approach.
> >
> > > 4. Given that state machine, run schemas through the parser until it
> > > can
> > > build machines from valid schemas and detect invalid ones.
> > >
> > > 5. Given a sound state machine, run instance test cases through the
> > > package until it is passing valid instances and detecting invalid
> > > ones.
> > >
> > > 6. This would probably be an iterative enhancement exercise, once the
> > > state machine engine was in place.
> > >
> > > I have a lex-workalike I wrote in Modula-2, which I'll use as the
> > > start point. Probably could use a SAX input approach ("next node"
> > > instead of "next char"), maybe with 1 lookahead.
> >
> > Just to note: LT-XML supposedly has a Python interface and an XSchemas
> > validator. I still think your effort would be worth-while, especially
> > given your fresh approach.
> >
> > http://www.ltg.ed.ac.uk/software/xml/
> >
> >
> > --
> > Uche Ogbuji
> > Personal: uche at ogbuji.net http://uche.ogbuji.net
> > Work: uche.ogbuji at fourthought.com http://Fourthought.com
>
--
Romuald Texier
More information about the Python-list
mailing list