[XML-SIG] parsers and XML

Lars Marius Garshol larsga@garshol.priv.no
08 Aug 2000 14:05:52 +0200


* travish@realtime.net
| 
| a) most of the XML "parsers" act appear to be lexers

You mean, since they don't build complete document trees? This is so
because XML has a much simpler structure (and potentially much greater
sizes) than what parsers traditionally have parsed. This makes an
event-based API very useful.

In Python we have so far chosen to make tree building separate
utilities.  If you want a document tree, look at 4DOM or qp_xml.
 
| b) none of the examples are of sufficient/substantial complexity
|    (e.g. recursive nesting, deep/complex hierarchy)
| 
|    If anyone has suggestions on what kind of parser to use as a back
|    end (yapps?  kjParsing?  etc.) I'd be interested to hear it.

I don't understand this question. 

| c) SGMLOP's description is substantially misleading:
| 
|  http://www.garshol.priv.no/download/xmltools/prod/sgmlop.html
| 
|  sgmlop is meant to behave identically with the sgmllib and xmllib
|  modules and replace them invisibly if it is present, so that one
|  does not have to change any code to use them. The saxlib package
|  has a SAX 1.0 driver for sgmlop.
| 
| This does not appear to be correct.  See diffs below.

The diffs seem to be for the pyexpat driver. This has nothing to do
with sgmlop or xmllib. 

What is the problem with the description?

| d) xmltok: no driver drv_xmltok
| e) XMLtoolkit: no module named XMLFactory
| f) xmldc: no module named xml_dc

If you don't have the parsers installed, the drivers won't work. :)

| g) Relative speeds on my hardware:
| 	sgmlop (C module)      4.15
| 	pyexpat (C module)     2.60
| 	xmlproc (python)       1.21
| 	xmllib (python)        1.00

Relative speed depends quite a bit on the document being parsed.
Also, the speed difference when using sgmlop is probably greater when
you don't use SAX.

--Lars M.