[DOC-SIG] What does this mean for Python?

Lars Marius Garshol larsga@ifi.uio.no
11 Mar 1998 17:55:39 +0100


* Paul Prescod
| 
| Many people equate CGI and Perl. I would hate to see that happen
| with XML and hope I can help to stop that from happening. In the
| short term, I will integrate JPython with a Java XML parser and
| write a tutorial on how to use that

For the last week I've been working on a validating XML parser in pure
Python that builds a grove-like tree. Initially, I built on xmllib and
added a DTD parser, a catalog file handler and some grove objects. I'm
now making my own replacement for xmllib (xmlproc), which I'm planning
to integrate with the existing validator/grove builder.

At this stage I'm able to produce basic ESIS output from xmlproc and
the validator/tree builder is advanced enough to parse Tim Brays plays
and religious texts. In fact I wrote a small script that went through
the tree and counted the number of speeches and lines for each
character in a play.[1]

The downside is that it takes 50 seconds to parse Hamlet (275 k) on my
Win95 Pentium 166MHz, which is much too slow.

This is how I envisioned my XML package:

1) SAX driver for xmllib
2) xmlproc uses SAX natively instead of using a driver, although it
   will probably need to add some things beyond SAX later

That gives us well-formedness-checking and a simple standardized
event-based API. 

Building on that I'd planned on making:

1) A simple ESIS outputter, for demo/testing purposes.
2) A grove builder, eventually with DOM support, although there are
   things I dislike about DOM.
3) A validator.

I also wanted to be able to have groves, validation or both.

The main catches I see here are:

1) Lack of Unicode support
2) Lack of speed

IMHO the solution to the speed is to do the xmllib/xmlproc part in C,
possibly via XMLTok, like Paul suggested. I think we should have a
Python version of this as well, and thanks to SAX, we can have our
cake and eat it too.

Given the reaction from people to this Perl thing I'm uncertain as to
what I should do. Perhaps I should rush out a minimal package
consisting of a SAX shell, an ESIS outputter building on it and a SAX
driver for xmllib? That would give any C volunteers something to build
towards and those who want to deal with the grove/validation part
something to build from.

What say ye, good people? 

[1] Hamlet has 359 speeches and 1459 lines, more than three times what
    any other character in Hamlet/Tempest/Romeo&Juliet has. Kenneth
    Branagh must have a photograpic memory. :)

-- 
"These are, as I began, cumbersome ways / to kill a man. Simpler, direct, 
and much more neat / is to see that he is living somewhere in the middle /
of the twentieth century, and leave him there."     -- Edwin Brock

 http://www.stud.ifi.uio.no/~larsga/      http://birk105.studby.uio.no/


_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________