[DOC-SIG] Re: What does this mean for Python?

Lars Marius Garshol larsga@ifi.uio.no
Fri, 13 Mar 1998 13:10:05 +0100


At 12:38 13.03.98 +0100, Sjoerd Mullender wrote:
>
> I have a question about the timings here.  How was the data fed to the 
> XML parser in xmllib.py?  If you do
>	python xmllib.py hamlet.xml
> the data is fed to the parser one character at the time.  

I think I fed it to the parser in 16K blocks, but I don't actually
remember how I did it.

Anyway, I will add a timer application to saxlib, so that anyone can do
their own speed testing and modify it as they wish. (I hope that satisfies
you as well, Jack.) I'll release that tonight (when I get home from work)
together with a driver for David Scherers XML-Toolkit (announced on
comp.lang.python on Wednesday). Hopefully I'll be able to get xmlproc out
some time during the weekend.

The really important issue here, I think, is standardizing the parser
APIs. We now have Dan Connolys XML scanner/parser, xmllib and David
Scherers parser, with at least two more coming up.

I'm still waiting for reactions to my SAX proposal. What do you people
out there think? Does it look usable? Should we make it the standard 
Python API or should we scrap it? Or should we modify it? Should we 
change the method names to be more Python-like? And can it be used with
JPython to interoperate with things like Don Parks SAXDOM? All 
comments/thoughts on this would be very welcome.

>I haven't done any optimisations in xmllib either.  One obvious
>optimization is to use regex instead of re (but I am not planning to
>do that).

I also use re and don't have any intention of changing, either.

Sjoerd, please don't feel threatened by my making my own parser. I did it
partly for fun and partly to better understand the interplay between XML
entities, well-formedness checking, validation, grove building and what
actually goes to the application. So it was not because of dissatisfaction
with xmllib, but because I wanted to understand these things better.

In fact, when I use xmllib with the SAX canonical XML outputter I seem to
get the same results that James Clarks XP gives, so it looks as though
xmllib pretty much follows the standard. (I haven't done any rigorous
testing, just tested some features I were uncertain about.)

I've been telling my colleagues here at STEP Infotek (an SGML firm)
about this Python/XML effort and at least two of them (who now use Java
and Perl) reacted with "Hmmm... Maybe I should start using Python for my
XML work." One of them has even printed out the Python tutorial already.

So I think this can be very beneficial for Python if we do it right. And
I definitely agree with Sean McGrath: Python is infinitely much better
than Perl for this kind of thing. Having a healthy crop of XML parsers
and tools written in Python would help make this clear to people. In fact,
I now have a list with links to free XML tools and it looks as though I
should split the parser section into Java parsers, Python parsers and
other parsers. IMHO, that's the kind of thing that would make an impression
on people getting into XML and looking for tools.

Just my $0.02, of course.

--Lars M.


_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________