[XML-SIG] Re: python SAX API

Lars Marius Garshol larsga@garshol.priv.no
16 Nov 2001 14:12:41 +0100


* Lars Marius Garshol
|
| One of the reasons to split startElement and startElementNS was to
| make things easier for newcomers to the API. They could then get
| started with ordinary XML programming with a relatively simple API,
| and not have to worry about the complexities of namespaces until
| they actually needed them. Having startElementNS as the default
| would negate that benefit.

* Alexandre Fayolle
| 
| My main concern is that this means python is providing a canada
| dried version of SAX: it's called SAX, it looks like SAX, it tastes
| a bit like SAX, but it's not SAX.

It's Python SAX. It doesn't make sense to make it exactly like Java
SAX. We did that with SAX 1.0, and learned that it wasn't a good
idea. Java and Python are different, and different things work well in
the two languages. It will be no different with C++ SAX, and is no
different with Perl SAX.

| I've learnt the Java SAX API, and I'm fairly familiar with it. My
| main use of XML and Python so far was DOM (esp.  4DOM), so I didn't
| look at python's SAX. Now I have to, and each time I do, I get nasty
| surprises. 

Well, I haven't had this experience. I was part of the Java SAX design
process, and have used Java SAX for years now. I have no problems
switching back and forth between the two. I think the problem here is
your expectation that the two should be the same, not the design of
Python SAX. And given the lack of Python SAX documentation I think
your expectation is to some extent inevitable.

| Different parsers not implementing the same spec is yet another
| one. 

If you know of drivers that are buggy, then please report them as
bugs to their developers.

| Different parsers representing empty nsuri with empty strings or
| None is another major pain.

Well, bugs are bugs. What we need to do is to document the right
approach. I don't think following the Java SAX design would have made
any difference.
 
| I personnally don't really care what the final choice is, as long as
| it is written in big red flashing letters in the python library
| reference guide, and I know that any python that calls itself a sax
| parser will behave correctly. 

I agree. What we need to do to solve this problem is to write
documentation, both for users as well as a brief guide for SAX driver
writers. 

| Now, if we are going to change the SAX API substancially, I'd say
| that SAX is a poor choice of a name, because people used to the real
| things will get the impression that they do not need to read the
| Python doc

It's too late to make substantial changes now. The current design was
carefully considered and developed over a long time. It's being used
in many different places and we're not going to change it in any
substantial way now.

Any developer who thinks Python SAX is the same as Java SAX needs to
adjust his expectations. It is not reasonable to assume that the two
will be identical, and anyone who does so will have nasty surprises,
which are well-deserved. Documentation will make the process of
adjustment easier, and even make it unnecessary in some cases.
 
|  * the startElement callback generates an exception because it gets
| called with the wrong number of arguments
| 
|  * the startElement callback renamed startElementNS is not called
| because they did not set the feature.
| 
|  * the startElementNS callback generates an exception because it is
| one again called with the wrong number of arguments.

None of these are difficult to explain to a developer. And in any case
they only happen when the developer starts out with the wrong ideas.
 
| At that point, they will have read the python sax documentation (and
| will have ROTFLed when seeing the reference to the Java SAX API
| website), and they'll be able to fix things, and they'll presumably
| have noticed that the prototype of the characters() callback is
| different. However, in the process, they'll have gotten a very messy
| impression of the XML support in Python.

If this happens it's inside of their heads that is messy, not Python's
XML support. I repeat: it is not reasonable to assume that Python and
Java SAX are identical. Java and Perl SAX are not identical, nor are
Java and C++ SAX. So why should Python be any different?
 
| Sorry for expressing myself on the topic so late in the developement
| process.

Well, there's nothing anyone can do about that. You're here now, and
you have this concern, so let's thrash it out and see what we can do
about it. It's the wrong time to change the API, but there must be
other things we can do.

--Lars M.