[Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

Nick Coghlan ncoghlan at gmail.com
Sat May 16 19:12:51 CEST 2015


On 15 May 2015 at 04:01, Chris Barker <chris.barker at noaa.gov> wrote:
>> >> I'm confused -- you don't want a system to be able to install ONE
>> >> version
>> >> of a lib that various python packages can all link to? That's really
>> >> the
>> >> key use-case for me....
>
>
>>
>> Are we talking about Python libraries accessed via Python APIs, or
>> linking to external dependencies not written in Python (including
>> linking directly to C libraries shipped with a Python library)?
>
>
> I, at least, am talking about the latter. for a concrete example: libpng,
> for instance, might be needed by PIL, wxPython, Matplotlib, and who knows
> what else. At this point, if you want to build a package of any of these,
> you need to statically link it into each of them, or distribute shared libs
> with each package -- if you ware using them all together (which I do,
> anyway) you now have three copies of the same lib (but maybe different
> versions) all linked into your executable. Maybe there is no downside to
> that (I haven't had a problem yet), but it seems like a bad way to do it!
>
>> It's the latter I consider to be out of scope for a language specific
>> packaging system
>
>
> Maybe, but it's a problem to be solved, and the Linux distros more or less
> solve it for us, but OS-X and Windows have no such system built in (OS-X
> does have Brew and macports....)

Windows 10 has Chocalatey and OneGet:

* https://chocolatey.org/
* http://blogs.msdn.com/b/garretts/archive/2015/01/27/oneget-and-the-windows-10-preview.aspx

conda and nix then fill the niche for language independent packaging
at the user level rather than the system level.

>> - Python packaging dependencies are designed to
>> describe inter-component dependencies based on the Python import
>> system, not dependencies based on the operating system provided C/C++
>> dynamic linking system.
>
> I think there is a bit of fuzz here -- cPython, at least, uses the "the
> operating system provided C/C++
> dynamic linking system" -- it's not a totally independent thing.

I'm specifically referring to the *declaration* of dependencies here.
While CPython itself will use the dynamic linker to load extension
modules found via the import system, the loading of further
dynamically linked modules beyond that point is entirely opaque not
only to the interpreter runtime at module import time, but also to pip
at installation time.

>> If folks are after the latter, than they want
>> a language independent package system, like conda, nix, or the system
>> package manager in a Linux distribution.
>
> And I am, indeed, focusing on conda lately for this reason -- but not all my
> users want to use a whole new system, they just want to "pip install" and
> have it work. And if you are using something like conda you don't need pip
> or wheels anyway!

Correct, just as if you're relying solely on Linux system packages,
you don't need pip or wheels. Aside from the fact that conda is
cross-platform, the main difference between the conda community and a
Linux distro is in the *kind* of software we're likely to have already
done the integration work for.

The key to understanding the difference in the respective roles of pip
and conda is realising that there are *two* basic distribution
scenarios that we want to be able to cover (I go into this in more
detail in https://www.python.org/dev/peps/pep-0426/#development-distribution-and-deployment-of-python-software):

* software developer/publisher -> software integrator/service operator
(or data analyst)
* software developer/publisher -> software integrator -> service
operator (or data analyst)

Note the second line has 3 groups and 2 distribution arrows, while the
first line only has the 2 groups and a single distribution step.

pip and the other Python specific tools cover that initial
developer/publisher -> integrator link for Python projects. This means
that Python developers only need to learn a single publishing
toolchain (the PyPA tooling) to get started, and they'll be able to
publish their software in a format that any integrator that supports
Python can consume (whether that's for direct consumption in a DIY
integration scenario, or to put through a redistributor's integration
processes).

On the consumption side, though, the nature of the PyPA tooling as a
platform-independent software publication toolchain means that if you
want to consume the PyPA formats directly, you need to be prepared to
do your own integration work. Many public web service developers are
entirely happy with that deal, but most system administrators and data
analysts trying to deal with components written in multiple
programming languages aren't.

That latter link, where the person or organisation handling the
software integration task is distinct from the person or organisation
running an operational service, or carrying out some data analysis,
are where the language independent redistributor tools like
Chocolatey, Nix, deb, rpm, conda, Docker, etc all come in - they let a
redistributor handle the integration task (or at least some of it) on
behalf of their users, leaving those users free to spend more of their
time on problems that are unique to them, rather than having to
duplicate the redistributor's integration work on their own time.

If you look at those pipelines from the service operator/data analyst
end, then the *first* question to ask is "Is there a software
integrator that targets the audience I am a member of?". If there is,
then you're likely to have a better experience reusing their work,
rather than spending time going on a DIY integration adventure. In
those cases, the fact that the tooling you're using to consume
software differs from that the original developers used to publish it
*should* be a hidden implementation detail. When it isn't, it's either
a sign that those of us in the "software integrator" role aren't
meeting the needs of our audience adequately, or else it's a sign that
that particular user made the wrong call in opting out of tackling the
"DIY integration" task.

>> I'm arguing against supporting direct C level dependencies between
>> packages that rely on dynamic linking to find each other rather than
>> going through the Python import system,
>
> Maybe there is a mid ground. For instance, I have a complex wrapper system
> around a bunch of C++ code. There are maybe 6 or 7 modules that all need to
> link against that C++ code. On OS-X (and I think Linux, I haven't been doing
> those builds), we can statically link all the C++ into one python module --
> then, as long as that python module is imported before the others, they will
> all work, and all use that same already loaded version of that library.
>
> (this doesn't work so nicely on Windows, unfortunately, so there, we build a
> dll, and have all the extensions link to it, then put the dll somewhere it
> gets found -- a little fuzzy on those details)
>
> So option (1) for something like libpng is to have a compiled python module
> that is little but a something that can be linked to ibpng, so that it can
> be found and loaded by cPython on import, and any other modules can then
> expect it to be there. This is a big old kludge, but I think could be done
> with little change to anything in Python or wheel, or...but it would require
> changes to how each package that use that lib sets itself up and checks for
> and install dependencies -- maybe not really possible. and it would be
> better if dependencies could be platform independent, which I'm not sure is
> supported now.
>
> option (2) would be to extend python's import mechanism a bit to allow it to
> do a raw "link in this arbitrary lib" action, so the lib would not have to
> be wrapped in a python module -- I don't know how possible that is, or if it
> would be worth it.

Your option 2 is specifically the kind of thing I don't want to
support, as it's incredibly hard to do right (to the tune of "people
will pay you millions of dollars a year to reduce-or-eliminate their
ABI compatibility concerns"), and has the potential to replace the
current you-need-to-be-able-build-this-from-source-yourself issue with
"oh, look, now you have a runtime ABI incompatibility, have fun
debugging that one, buddy".

Your option 1 seems somewhat more plausible, as I believe it should
theoretically be possible to use the PyCObject/PyCapsule API (or even
just normal Python objects) to pass the relevant shared library
details from a "master" module that determines which versions of
external libraries to link against, to other modules that always want
to load them, in a way that ensures everything is linking against a
version that it is ABI compatible with.

That would require someone to actually work on the necessary tooling
to help with that though, as you wouldn't be able to rely on the
implicit dynamic linking provided by C/C++ toolchains any more.
Probably the best positioned to tackle that idea would be the Cython
community, since they could generate all the required cross-platform
boilerplate code automatically.

>>  (Another way of looking at this: if a tool can manage the
>> Python runtime in addition to Python modules, it's a full-blown
>> arbitrary software distribution platform, not just a Python package
>> manager).
>
> sure, but if it's ALSO a Python package manger, then why not? i.e. conda --
> if we all used conda, we wouldn't need pip+wheel.

conda's not a Python package manager, it's a language independent
package manager that was born out of the Scientific Python community
and includes Python as one of its supported languages, just like nix,
deb, rpm, etc.

That makes it an interesting alternative to pip on the package
*consumption* side for data analysts, but it isn't currently a good
fit for any of pip's other use cases (e.g. one of the scenarios I'm
personally most interested in is that pip is now part of the
Fedora/RHEL/CentOS build pipeline for Python based RPM packages - we
universally recommend using "pip install" in the %install phase over
using "setup.py install" directly)

>> Defining cross-platform ABIs (cf. http://bugs.python.org/issue23966)
>
> This is a mess that you need to deal with for ANY binary package -- that's
> why we don't distribute binary wheels on pypi for Linux, yes?

Yes, the reason we don't do *nix packages on any platform other than
Mac OS X is because the platform defines the CPython ABI along with
everything else.

It's a fair bit more manageable when we're just dealing with extension
modules on Windows and Mac OS X, as we can anchor the ABI on the
CPython interpreter ABI.

>> I'm
>> firmly of the opinion that trying to solve both sets of problems with
>> a single tool will produce a result that doesn't work as well for
>> *either* use case as separate tools can.
>
> I'm going to point to conda again -- it solves both problems, and it's
> better to use it for all your packages than mingling it with pip (though you
> CAN mingle it with pip...). So if we say "pip and friends are not going to
> do that", then we are saying: we don't support a substantial class of
> packages, and then I wonder what the point is to supporting binary packages
> at all?

Binary wheels already work for Python packages that have been
developed with cross-platform maintainability and deployability taken
into account as key design considerations (including pure Python
wheels, where the binary format just serves as an installation
accelerator). That category just happens to exclude almost all
research and data analysis software, because it excludes the libraries
at the bottom of that stack (not worry to much about deployability
concerns bought the Scientific Python stack a lot of functionality,
but it *did* come at a price).

It's also the case that when you *are* doing your own system
integration, wheels are a powerful tool for caching builds, since you
can deal with ABI compatibility concerns through out of band
mechanisms, such as standardising your build platform and your
deployment platform on a single OS. If you both build and deploy on
CentOS 6, then it doesn't matter that your wheel files may not work on
CentOS 7, or Ubuntu, or Debian, or Cygwin, because you're not
deploying them there, and if you switched platforms, you'd just redo
your builds.

>> P.S. The ABI definition problem is at least somewhat manageable for
>> Windows and Mac OS X desktop/laptop environments
>
> Ah -- here is a key point -- because of that, we DO support binary packages
> on PyPi -- but only for Windows and OS-X.. I'm just suggesting we find a way
> to extend that to pacakges that require a non-system non-python dependency.

At the point you're managing arbitrary external binary dependencies,
you've lost all the constraints that let us get away with doing this
for extension modules without adequate metadata, and are back to
trying to solve the same arbitrary ABI problem that exists on Linux.

This is multi-billion-dollar-operating-system-companies-struggle-to-get-this-right
levels of difficulty that we're talking about here :)

>>  but beyond
>> those two, things get very messy, very fast - identifying CPU
>> architectures, CPU operating modes and kernel syscall interfaces
>> correctly is still a hard problem in the Linux distribution space
>
> right -- but where I am confused is where the line is drawn -- it seem sto
> be the line is REALLY drawn at "yuo need to compile some C (or Fortran,
> or???) code, rather than at "you depend on another lib" -- the C code,
> whether it is a third party lib, or part of your extension, still needs to
> be compiled to match the host platform.

The line is drawn at ABI compatibility management. We're able to fuzz
that line a little bit in the case of Windows and Mac OS X extension
modules because we have the python.org CPython releases to act as an
anchor for the ABI definition.

We don't have that at all on other *nix platforms, and we don't have
it on Windows and Mac OS X either once we move beyond the CPython C
ABI (which encompasses the underlying platform ABI)

We *might* be able to get to the point of being able to describe
platform ABIs well enough to allow public wheels for arbitrary
platforms, but we haven't had any plausible sounding designs put
forward for that as yet, and it still wouldn't allow depending on
arbitrary external binaries (only the versions integrated with a given
platform).

>>  but the rise of aarch64 and IBM's
>> creation of the OpenPOWER Foundation is making the data centre space
>> interesting again, while in the mobile and embedded spaces it's ARM
>> that is the default, with x86_64 attempting to make inroads.
>
> Are those the targets for binary wheels? I don't think so.

Yes, they'll likely end up being one of Fedora's targets for prebuilt
wheel files: https://fedoraproject.org/wiki/Env_and_Stacks/Projects/UserLevelPackageManagement

>> This is why
>>
>> "statically link all the things" keeps coming back in various guises
>
> but if you statically link, you need to build the static package right
> anyway -- so it doesn't actually solve the problem at hand anyway.

Yes it does - you just need to make sure your build environment
suitably matches your target deployment environment.

"Publishing on PyPI" is only one of the use cases for wheel files, and
it isn't relevant to any of my own personal use cases (which all
involve a PyPI independent build system, with PyPI used solely as a
source of sdist archives).

>> The only solution that is known to work reliably for dynamic linking
>> is to have a curated set of packages all built by the same build
>> system, so you know they're using consistent build settings. Linux
>> distributions provide this, as do multi-OS platforms like nix and
>> conda. We *might* be able to provide it for Python someday if PyPI
>> ever gets an integrated build farm, but that's still a big "if" at
>> this point.
>
> Ah -- here is the issue -- but I think we HAVE pretty much got what we need
> here -- at least for Windows and OS-X. It depends what you mean by
> "curated", but it seems we have a (defacto?) policy for PyPi: binary wheels
> should be compatible with the python.org builds. So while each package wheel
> is supplied by the package maintainer one way or another, rather than by a
> central entity, it is more or less curated -- or at least standardized. And
> if you are going to put a binary wheel up, you need to make sure it matches
> -- and that is less than trivial for packages that require a third party
> dependency -- but building the lib statically and then linking it in is not
> inherently easier than doing a dynamic link.
>
> OK -- I just remembered the missing link for doing what I proposed above for
> third party dynamic libs: at this point dependencies are tied to a
> particular package -- whereas my plan above would require a dependency ties
> to particular wheel, not the package as a whole. i.e:
>
> my mythical matplotlib wheel on OS-X would depend on a py_libpng module --
> which could be provided as separate binary wheel. but matplotlib in general
> would not have that dependency -- for instance, on Linux, folks would want
> it to build against the system lib, and not have another dependency. Even on
> OS-X, homebrew users would want it to build against the homebrew lib, etc...
>
> So what would be good is a way to specify a "this build" dependency. That
> can be hacked in, of course, but nicer not to have to.

By the time you've solved all these problems I believe you'll find you
have reinvented conda ;)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list