[XML-SIG] parsers and XML

Lars Marius Garshol larsga@garshol.priv.no
21 Aug 2000 16:20:17 +0200


* Lars Marius Garshol
|
| This is so because XML has a much simpler structure (and potentially
| much greater sizes) than what parsers traditionally have parsed.

* travish@realtime.net
| 
| I'm not so sure; I've compiled very large C files before.

Upwards of 100 MB? In a single file?
 
* Lars Marius Garshol
|
| This makes an event-based API very useful.
 
* travish@realtime.net
|
| The "event-based API" bears a striking resemblance to a lexer, and
| is usually only useful if you do a certain amount of state-tracking
| yourself.  (e.g. how many levels of tags deep am I, and which tags
| are they?)  That is the traditional role of a parser, and the
| "event-driven API" apparently does none of it.

This is all correct, but XML documents and computer programs have very
different uses and structures and so it is really most productive if
you try to forget how things are done in compilers and start with a
clean slate when learning XML.
 
| Actually, I want something between the two APIs that appear to be
| present (lexing and generating an AST).  For example, in the reduce
| phase of a shift-reduce parser like yacc (which corresponds to a
| close-tag event from an "event driven API"), one is given the
| ability to 'condense' all of the subtrees of this particular node,
| requiring neither a full AST nor keeping track of the stack of
| nested tags you may currently be processing in.  This would be
| extremely handy for (e.g.) converting XML to nested data structures.

This is a very reasonable wish and such tools are already in
existence.  Pyxie and eventdom can both do this.  
 
* Lars Marius Garshol
|
| The diffs seem to be for the pyexpat driver. This has nothing to do
| with sgmlop or xmllib. 
 
* travish@realtime.net
|
| Perhaps you should look a little more carefully before sending back
| such a pointed response.

I'm sorry if I have offended you. My response was not at all intended
to be pointed.
 
* Lars Marius Garshol
|
| What is the problem with the description?
 
* travish@realtime.net
|
| For one thing, it appears that the character accumulation callback has
| a different signature than the other parsers, passing only one argument
| instead of three (charstr, start, len).  If so, that hardly makes sgmlop
| replace the other parsers invisibly.

It seems that you have confused the levels here. sgmlop's native API
(not the SAX driver) is intended to be a drop-in replacement for xmllib.
Of course, the SAX drivers are all intended to be (as far as possible)
interchangeable, so if you've found a version of the sgmlop driver
that has this problem I would be glad to hear where you have it from.

--Lars M.