[Distutils] python eggs and lxml

Martijn Faassen faassen at infrae.com
Sat Sep 24 11:13:39 CEST 2005


Phillip J. Eby wrote:
> At 11:57 PM 9/23/2005 +0200, Martijn Faassen wrote:
> 
>> I've been looking at Python eggs, easy install and setuptools with a lot
>> of interest -- very impressive work. I've been thinking of packaging
>> lxml with it, and, on the larger scale, look into packaging Zope 3 
>> with it.
>>
>> Concerning lxml I run into a few questions however.
>>
>> lxml depends on large external C libraries (libxml2 and libxslt).
>>
>> a) Is there a way to require versions of C libraries to be available in
>> the Python eggs dependencies? I can't seem to find a reference to this
>> scenario, but perhaps I didn't look carefully enough. The goal here
>> would be to give users trying to install lxml (or something that depends
>> on lxml) useful feedback about what in their system they're missing (or
>> have the wrong version of).
>  
> Your options here are the same as with any distutils package, which is 
> to say you have to figure it out yourself.  ;)  You can add code to look 
> for the libraries, embed your own source, etc.

Right. I was hoping I didn't need to dive into the internals of 
distutils of course, but it's no surprise that I'd have to.

>> b) Going further, it'd be nice for some scenarios to actually be able to
>> include private versions of libxml2/libxslt in a Python eggs. This is
>> especially interesting on Windows deployments (where you'd include a
>> binary of these libraries). Has something like this considered? I saw
>> references to Pyrex support, but in lxml's case, the Pyrex code depends
>> on a large underlying library.
> 
> You can certainly do that; just list the appropriate .c files in your 
> Extension.
> 
> For PEAK on Python 2.3, I include an expat wrapping that adds the Python 
> 2.4 pyexpat features this way, using something like:
> 
>     Extension("peak.util.pyexpat", [
>             "src/peak/util/pyexpat.c", "src/expat/xmlparse.c",
>             "src/expat/xmltok.c", "src/expat/xmlrole.c",
>         ],
>         include_dirs=["src/expat"],
>         define_macros=[('XML_STATIC',1),('HAVE_MEMMOVE',1)]   # XXX
>     ),

libxml2 however is a huge C library with its own configure script (that 
it really uses, as it ports to a zillion platforms), so just listing C 
files to compile might very well not work, right?

I guess for Windows, I'd have make distutils run the configure script, 
then extract the dlls it produces and stuff them in the egg somehow. Any 
direction you'd point me towards for this?

>> c) It's also interesting though for deployment on linux. It'd be nice to
>> be able to include the source versions of specific versions of libxml2
>> and libxslt with lxml and to be able to build/install them such that
>> they are only used for lxml. This way the system libraries (which may be
>> out of date or have otherwise a wrong version) would not be in play and
>> wouldn't be affected.
> 
> Yeah, just bake it in as shown above.

In this case, on Linux, I'd want to run the configure script when the 
egg is installed instead of when it's created, and stuff the .so files 
in the same place the egg is being installed to.

>> If something like this were arranged, it'd be much easier to make lxml a
>> requirement of a large package like for instance Zope 3 (which is being
>> considered).
>>
>> I realize that any or all of these might be out of scope for easy
>> install -- in the Linux case, it might be deferred to a Linux package
>> management system, for instance. Still, I imagine the case where a
>> Python library has a dependency on a potentially large non-Python
>> codebase could be fairly common, and it'd be nice if such libraries
>> could be "first class" easy_install citizens so that other Python
>> libraries can safely depend on them. What are people's thoughts were
>> about supporting such scenarios?
> 

> Not all libraries can be bundled by source, of course.  Sometimes you 
> really need to use whatever the "system version" is, for one reason or 
> another.  Database clients, for example, are something you really really 
> want to use the local version for.  

Right, there are competing use cases here. What I'd like is an easy 
install for lxml that just works for people, without them having to 
worry about the right lxml2 versions being installed, etc. On Windows 
this means binaries, and on Linux this likely means it'll just compile 
upon install.

Some classes of people, like distributors and some sysadmins, care about 
using the platform version of libxml2, and I'd also want to create an 
egg that allows you to install against the platform libraries. Would 
this be possible to be the same egg or would a different egg be needed? 
If a different egg, how does this work with the dependency system? I.e. 
these two eggs would be alternatives of each other dependency-wise.

> I'm thinking that the distutils 
> could really use some sort of library-finding capabilities for that 
> stuff, assuming they don't already have some I just haven't found yet.

Yes, that would indeed be useful.

Thanks for the feedback!

Regards,

Martijn


More information about the Distutils-SIG mailing list