[Distutils] Making pip and PyPI work with conda packages

Chris Barker chris.barker at noaa.gov
Tue May 19 01:25:06 CEST 2015


A member of the conda dev team could answer this better than I, but I've
used enough to _think_ I understand the basics:

On Mon, May 18, 2015 at 3:30 AM, Paul Moore <p.f.moore at gmail.com> wrote:

> One way forward in terms of building wheels is to use any build
> process you like to do an isolated build (I think it's --root that
> distutils uses for this sort of thing) and then use distlib to build a
> wheel from the resulting directory structure (or do it by hand, it's
> not much more than a bit of directory rearrangement and zipping things
> up).
>
> That process can be automated any way you like - although ideally via
> something general, so projects don't have to reinvent the wheel every
> time.
>

sure -- you can put virtually anything in a conda build script. what conda
build does is more or less:

* setup  an isolated environment with some handy environment variables for
things like the python interpreter, etc.

* run your build script

* package up whatever got built.

If processes like conda then used wheels as their input for building
> packages, the wheels could *also* be published


I'm not sure it's any easier to build a wheel, then make a conda package
out of it, than to build a conda package, and then make a wheel out of it.
Or have your build scrit build a wheel, and then independently build a
conda package.

In any case, the resulting wheel would depend on an environment like the
one set up by conda build -- and that is an environment with all the
dependencies installed -- which is where this gets ugly.

[remember, making the wheel itself it the easy part]


> not least, does the
> way conda uses shared libraries make going via wheels impossible (or
> at least make the wheels unusable without conda's support for
> installing non-Python shared libraries)?


Pretty much, yes. conda provides a way to package up and manage arbitrary
stuff -- in this case, that would be non-python dependencies -- i.e. shared
libs.

So you can say that my_python_package depends on this_c_lib, and as long as
you, or someone else has made a conda package for this_c_lib, then all is
well.

But python, setuptools, pip, wheel, etc. don't have a way to handle that
shared lib as a dependency -- no standard way where to put it, no way to
package it as a wheel, etc.

So the way to deal with this with wheels is to statically link everything.
But that's not how conda pa cakges are built, so no way to leverage conda
here.

We need to remember what leveraging conda would buy us:

conda doesn't actually make it any easier to build anything -- you need a
platform-specific build script to build a conda package.

conda does provide a way to manage non-python dependencies -- but that
doesn't buy you anything unless you are using conda to manage your system
anyway.

conda DOES provide a community of people figuring out how to build complex
packages, and building them, and putting them up for public dissemination.

So the thing that leveraging conda can do is reduce the need for a lot of
duplicated effort. And that effort is almost entirely about those third
part libs -- after all, a compiled extension that has no dependencies is
easy to build and put on PyPi. (OK, there is still a bit of duplicated
effort in making the builds themselves on multiple platforms -- but with CI
systems, that's not huge)

An example:

I have a complex package that not depends on all sorts of hard-to-build
python packages, but also has its own C++ code that depends on the netcdf4
library. Which in turn, depends on the hdf5 lib, which depends on libcurl,
and zlib, and (I think one or two others).

Making binary wheels of this requires me to figure out how to build all
those deps on at least two platforms (Windows being the nightmare, but OS-X
is not trivial, too, if I want it to match the python.org build, and
support older OS versions than I am running)

Then I could have a nice binary wheel that my users can pip install and
away they go. But:

1) They also need the Py_netCDF4 package, which may or may not be easy to
find. If not -- they need to go through all that build hell. Then they have
a package that is using a bunch of the same shared libs as mine -- and
hopefully no version conflicts...

2) my package is under development -- what I really want is for it to be
easy for my users to build from source, so they can keep it all up to date
from the repo. Now they need to get a development version of all those libs
up and running on their  machines -- a heavy lift for a lot of people.

So now - "use Anaconda" is a pretty good solution -- it provides all the
libs I need, and someone else has figured out how to build them on the
platforms I care about.

But it would be nice if we could find a way for the "standard" python
toolchain could support this.

NOTE: as someone suggested on this list was to provide (outside of PyPi +
pip), a set of static libs all built and configured, so that I could say:
"install these libs from some_other_place, then you can build and run my
code" -- that may be doable with a community effort and no change in the
tooling, but it has not so far.

In short: what conda and the conda community provide is python-compatible
third party libs. We can't take advantage of that with pip+PyPi unless we
find a way to support third party libs with those tools.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150518/5062f929/attachment-0001.html>


More information about the Distutils-SIG mailing list