[XML-SIG] dumping an XML parser skeleton from DTD input

Thomas B. Passin tpassin@home.com
Sat, 10 Mar 2001 10:10:27 -0500


<Eugene.Leitl@lrz.uni-muenchen.de> wrote -

You are mixing up several concepts or processing steps.

1) Parsing  xml.
This means to get hold of the structural elements of the xml document and give
them to another application for further processing.  There are many xml
parsers out there, come command line and some not.  It's almost certainly not
worth it to roll your own.

2) Creating a tree-like structure to represent the structure of the xml
document.
The DOM is an API for a tree-like representation.  Most major parsers out
there either include a DOM api or can work with another DOM API.  (SAX is a
non-DOM api, but the output of a sax processsor can be used to build a tree,
too).  The DOM is an object oriented api.

3) DOM manipulation, using the DOM api. There are already good processors that
can use the DOM api to manipulate and actual, populated DOM trees.  So don't
roll your own there, either.

4) You don't need a DTD, but it's a good idea to make one anyway because then
you can use a validating parser to check that the first xml examples that you
build are "valid" - i.e., put together correctly from a structural point of
view.  It's amazing how easy it is to accidently create something else besides
what you thought you were making.

Otherwise, you can start simple with no DTD and later define one after you
have some hands-on experience working with xml.

As Martin said, the  Python PyXML package is very good.  There's also the
Microsoft xml processor, which can be written to as a COM object, in VBscript,
or in Javascript.  There are several good java processors, and some good Perl
ones.  Python would be the quickest and easiest to use, especially if you are
not already up to speed in one of the other languages.  Even if you are,
Python will be faster and easier to use than one of the strongly typed
compiled languages like java.

Get a good book or two, like Wrox's Professional XML and XML in a Nutshell
from O'Reilly, to mention only two of the good ones out there.

>
> The company I'm with has the following ad hoc approach to XML:
> whip up some XML fitting the problem, don't bother with writing
> a DTD, code up a parser in an OO language, which recursively
> reads the tags into memory, creating a hierarchy/tree of objects.
> Fill in methods to deal with the data sitting in the tree, finis.
>
> I looked at the way other people parse XML, and ran into DOM, which seemed
> to imply the company has reinvented the wheel>
>
Yes, the wheel has already been invented.  But core dumps aren't going to be
very useful.  Do examples from a book or tutorial site, fix them til they run
right, then start morphing them closer to what you want to do.  You don't need
to try to understand a DOM tree from a core dump.  Learn about the api
instead.

Cheers,

Tom P