[XML-SIG] SAX prettyprinter V2 and SGMLOP

Lars Marius Garshol larsga@ifi.uio.no
25 Jan 1999 22:18:29 +0100


* Christian Tismer
| 
| [About ot.xml]
|
| Interesting. I tested my Indenter with this file (what a nice
| example).

A rather misleading one, I'm afraid, since it doesn't use entities,
comments, PIs, marked sections or attributes, only elements and
PCDATA.

| It takes 11.75 seconds to indent this through SAX, using sgmlop.
| With xmlproc, it takes 30.87 seconds.  

Interesting. (And pleasing. :)

| Running the whole text through sgmlop without any associated events
| ran in below one second.
 
It's worth noting that this is just the time for the raw parse. As far
as I know, sgmlop will not call handlers if there aren't any and so
this entire second will be spent in C source.

| I want to validate small amounts of newly added data "records" which
| are in XML format, but then kept in a special repository, and I want
| to be able to re-import large amounts of XML which were exported by
| my app before. This means, I need a validating parser of acceptable
| speed, where I think xmlproc is very good? 

I think the Java parsers are probably faster, but xmlproc should be
acceptable, yes. 

When I release 0.60 the DTD parser and DTD objects are separated from
the XML parser. This means that provided you can get the external and
internal DTD subsets from expat it's possible to build an expat-based
validator using the xmlproc sources. This will require a bit of work,
though.

With DTD caching (scheduled for 0.61 in my current plans) you won't
have to keep reparsing the DTD for each document either, thus saving
even more speed. (Parse times for large DTDs such as TEI-XML take
substantial amounts of time.)

--Lars M.