[XML-SIG] saxlib, xml, _xmlplus, etc.

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 7 Mar 2001 08:16:39 +0100


> i do have tmproc working now (my first goal), but it was heavy going
> there because of the lack of road signs for the new person (new to
> python and its xml tools, but experienced with xml and other
> programming languages).

Sorry for the confusion. Please notice that you are a "rare case";
most people complaining about bad documentation are familiar with
Python but new to XML, so they need to understand terms like "parser",
"event-driven", "tree-based", etc.

> in particular, what is the relationship between:
> - the saxlib available from http://www.garshol.priv.no/download/software/saxlib/
> - the xml core package that comes with python 2.x
> - the _xmlplus package that comes with the pyxml package from the xml-sig at sourceforge

As you can see from the "last release" date on the saxlib page, this
package is quite outdated. It has been incorporated in PyXML in the
past, and is known today as "Python SAX version 1". Today, the
preferred SAX API is SAX2, which is included in Python 2 and PyXML
(PyXML continues to provide the SAX1 interfaces).

In addition to the API spec, there is a number of SAX drivers in each
package. The saxlib has the SAX1 drivers, Python 2 only has a Expat
SAX2 driver, and PyXML has SAX1 and SAX2 drivers (in the latter
category, only Expat and xmlproc).

PyXML is meant as a strict superset of the Python 2 XML offerings; in
all aspects that are present in Python 2, PyXML should behave
identical (as far as possible and reasonable).

> i can't find any explanation accessible from various top-level pages:
>    http://pyxml.sourceforge.net/topics/
>    http://www.python.org/sigs/xml-sig/
>    http://www.python.org/sigs/xml-sig/status.html
>    http://www.python.org/doc/howto/xml/ .

> nor do any of the three packages above seem to have any obvious
> mention of the other two.

In the README of PyXML itself, you'll notice that saxlib 1.0 is
included. The relationship with Python 2 should be documented better;
thanks for pointing that out.

> nor can i find an "xml and python faq", though surely this issue is
> an example of such a faq.

So far, people have been using the tutorial, and API documentation. I
couldn't say that any specific question is asked frequently - this is
the first time that your question comes up on this list.

> another would be: "will old python programs written against sax1
> work with the latest pyxml?"

Yes; people find out by trying. There is at least one minor
incompatibility: In Python 2, SAX drivers may produce Unicode strings,
which old applications may not expect.

> i've also looked at the ugly hack in xml/__init__.py for loading
> _xmlplus, though i still don't know what the difference is between
> the packages.

That hack is needed to provide the "strict superset" relationship
between PyXML and Python 2. It allows you to think of PyXML in terms
of "from xml.sax import ...", instead of "from _xmlplus.sax import
...". If PyXML is installed on top of Python 1.5.2, it will call its
package directory "xml".

> btw also, is it expected that the pyxml win32 installer for 2.0 not
> work with the python 2.1 beta?

Yes, binary modules will need recompilation - the extension modules
contain references to "python20.dll", and hell breaks lose if you load
conflicting python<foo>.dll into the same process (and try to access
them from the same interpreter).

> when i ran the installer, it didn't even find the 2.1 installation.

That is intentional, yes. To use PyXML with Python 2.1b1, you'll need
to compile it yourself from sources; that requires a VC++
installation.

> if binary packages are obsoleted by dot revisions in the core, it is
> going to be painful for everyone.

Unfortunately, that is a specific form of "DLL hell"; there is not
much that can be done about it except guaranteeing that conflicting
things are not used together - the installer refusing to install the
package anywhere else is one aspect of that.

> btw again, another faq should be how urllib deals with win32 drive letters.
>
> it barfs on things like "c:/tmp/myfile.xml" which is inconvenient
> but understandable, because there is no such thing

Likely, there should be, yes - but there appears to be no expert that
can say for sure what the "right way" is. In any case, you'll need to
pass URLs to urllib, and as system identifiers to XML libraries. On
Unix, passing file names should "work" in most cases; on Windows,
things are a bit more complicated.

If you can give a consistent story of how things *should* work, I'll
start a FAQ list (since your message is the third instance of this
question during this year - which makes it frequent :-). Out of
curiosity: how do you interpret RFC 1738 with regard to drive letters?
I.e. what is the URL referring to C:\autoexec.bat?

Regards,
Martin