Using Installer with PyXML

David Bolen db3l at fitlinxx.com
Mon Jan 22 14:46:46 EST 2001


"Dan Rolander" <dan.rolander at marriott.com> writes:

> I am by no means an expert on this, so if anybody understands this better
> and can provide simpler workarounds I would appreciate hearing it.

First, one short suggestion for the pyexpat.pyd issue - for your
"misc=" line, ensure you are specifying it as "xml.parsers.pyexpat",
and don't include the "pyd".  Otherwise I think the installer will
look for a "pyd" module beneath the pyexpat module which it won't be
able to find.  The module name itself is "xml.parsers.pyexpat", and
the installer will be able to determine it's a pyd and include the
file as a binary.  (I didn't have a 2.0 setup to verify on but with
1.5.2, I get the same ValueError as you if I include the extension).

In terms of the more general stuff - I may be off on this explanation
(which I'm sure someone will correct) as it's been a while since I
delved into the installer package and the import mechanism, but here's
my take.

One one level, the installer package is elegant in its simplicity, but
on another level it's inherently a very complex process.  At its core,
it packages any number of files together, either placing them in an
overall binary archive (file type independent) or a compressed Python
archive (for imported modules) that it knows how to import a single
module from efficiently.  The trickiest part is figuring out just
which modules need to be included for a particular script because
Python can be so dynamic in how it accesses modules.

The xml module itself is tricky because it is dynamically replacing
itself with the _xmlplus module when present, due to the desire to
include it in the core Python 2.0 but still permit transparent
hand-off to a more recent version if the user has PyXML installed.
This hand-off is done dynamically at run time, but the installer tool
needs to try to determine this at "build" time so it can include all
the right files in the right references.  This is the root of needing
to specifically request the package be included in the final output.
Without actually executing the code, it's hard for the installer to
know that the _xmlplus module is eventually referenced.

In terms of the naming issue between xml and _xmlplus, I'm guessing
there ought to be a way not to need to do that, but had suggested it
as a near term workaround to resolve the particular issue at hand,
which seems to be the installer import mechanism not following the
renaming.  I'm guessing that there's something interfering with the
installer package import mechanism from tracking the re-assignment of
the xml module to _xmlplus, but am not certain if it's fundamentally
not going to work, or just a bug in the current system.

Renaming the Python 2.0 xml library out of the way also reduces the
modules included in the installer output, since it's not necessary to
include the Python 2.0 xml library if you know it's just going to use
the _xmlplus version instead.  It does require some adjustment to at
least the one machine that is going the installer build, but that's
the only one that needs to change.

The standard import mechanism loads each module in a hierarchy in
turn, and then looks into that module's dictionary to locate
submodules.  So when you do something like "from xml.sax import", the
built in import mechanism finds "xml" in sys.modules (loading if
necessary, and since it's a package, using __init__.py).  The loading
of __init__.py changes sys.modules["xml"] to point to
sys.modules["_xmlplus"] if present so everyone finds _xmlplus later
without knowing about it.

The run time handling doesn't change when the application is run as an
"installer" package, but there seems to be a mismatch between the
actual execution of the code (which affects sys.modules in memory) and
the processing of import statements which directly load the referenced
modules from the installer archive.  That could be seen in your
failure with "xml.sax.saxlib" (which only exists in the PyXML _xmlplus
package) - the importer should have been trying to import
_xmlplus.sax.saxlib at that point, but it was still using the "xml"
prefix.

The installer setup establishes a new import handler that understands
how to load a specific module from a compressed archive (ZlibArchive)
that is stored in the overall executable archive (CArchive).  The
archive itself is flat - it works by storing each module with a "fully
qualified" module name.  So the xml.sax module is stored with an index
name of "xml.sax", and likewise the replacement "_xmlplus.sax" module.
When code does an import of something in xml.sax, the importer still
traverses the tree, but it would appear that at some point in the
process it is still trying to load "xml.sax" and not "_xmlplus.sax" as
it should be.  (Imputil seems to be using the module __name__
attribute so at first glance I'm not sure why it loses track of the
rename).

So - short term replacing your development setup so that _xmlplus is
actually xml should take care of the issue of tracking that rename for
the installer package's import mechanism (or the issue may be in the
imputil module which is now part of Python 2.0).

The issue with the pyexpat.pyd and xml*.dll files is slightly
different.  The installer tries to find native DLLs (e.g.,
dependencies of python pyd modules) that are required due to imports
of the pyds.  But if you include a file in "misc=", it just adds that
file and doesn't check for dependencies.  (You may be able to get this
with bindepends, I'm not positive).  But it just means that when
specifying miscellaneous files, you might have to determine any DLL
dependencies yourself.  There are standard Windows tools to help out
with this if necessary.

The python 2.0 distribution seems to split out some of the raw xml
processing into separate DLLs, whereas I think the PyXML distribution
includes them all in pyexpat.pyd.  That's why when you switched to the
PyXML version it no longer complained about the separate xml*.dll
files.  But as noted at the top of this post, I don't think you need
to do any copying of the PYD file from the PyXML distribution into
your DLLs directory of your Python 2.0 distribution.  The only wrinkle
might be if some code did actually directly import "pyexpat" which
would expect it to be somewhere directly on sys.path rather than
beneath xml.parsers.  That might result in the installer including
both versions of pyexpat based on the import.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list