[Distutils] Dynamic linking between Python modules (was: Beyond wheels 1.0: helping downstream, FHS and more)

Nick Coghlan ncoghlan at gmail.com
Wed May 13 10:36:56 CEST 2015


On 13 May 2015 at 16:19, Ben Finney <ben+python at benfinney.id.au> wrote:
> Chris Barker <chris.barker at noaa.gov> writes:
>
>> On Tue, Apr 14, 2015 at 8:41 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> > The point where I draw the line is supporting *dynamic* linking
>> > between modules -
>>
>> I'm confused -- you don't want a system to be able to install ONE version
>> of a lib that various python packages can all link to? That's really the
>> key use-case for me....
>
> Agreed. A key pain point for Python distributions is the lack of support
> for installing *one* instrance of a Python library, and other Python
> modules able to discover such installed libraries which meet their
> declared dependency.

Are we talking about Python libraries accessed via Python APIs, or
linking to external dependencies not written in Python (including
linking directly to C libraries shipped with a Python library)?

It's the latter I consider to be out of scope for a language specific
packaging system - Python packaging dependencies are designed to
describe inter-component dependencies based on the Python import
system, not dependencies based on the operating system provided C/C++
dynamic linking system. If folks are after the latter, than they want
a language independent package system, like conda, nix, or the system
package manager in a Linux distribution.

> For example:
>
> * Python distribution ‘foo’, for Python implementation “D” on
>   architecture “X“, declares dependency on “bar >= 1.7”.
>
> * Installing Python distribution ‘bar’ version 1.8, on a host running
>   Python “D” for architecture “X”, goes to a single instance for the
>   ‘bar’ library for that architecture and Python implementation.
>
> * Invoking the ‘foo’ code on the same host will go looking
>   (dynamically?) for the dependency ‘bar’ and find version 1.8 already
>   installed in the one instance on that host. It uses that and all is
>   well.
>
> I'm in agreement with Chris that, while the above example may not
> currently play out as described, that is a fault to be fixed by
> improving Python's packaging and distribution tools so that it *is* a
> first-class use case.
>
> Nick, you seem to be arguing against that. Can you clarify?

I'm arguing against supporting direct C level dependencies between
packages that rely on dynamic linking to find each other rather than
going through the Python import system, as I consider that the point
where you cross the line into defining a new platform of your own,
rather than providing components that can plug into a range of
platforms. (Another way of looking at this: if a tool can manage the
Python runtime in addition to Python modules, it's a full-blown
arbitrary software distribution platform, not just a Python package
manager).

Defining cross-platform ABIs (cf. http://bugs.python.org/issue23966)
is an unholy mess that will be quite willing to consume vast amounts
of time without a great deal to show for it beyond can already be
achieved more easily by telling people to just use one of the many
existing systems designed specifically to solve that problem (with
conda being my default recommendation if you care about Windows, and
nix being my recommendation if you only care about *nix systems).

Integrator oriented packaging tools and developer oriented packaging
tools solve different problems for different groups of people, so I'm
firmly of the opinion that trying to solve both sets of problems with
a single tool will produce a result that doesn't work as well for
*either* use case as separate tools can.

Cheers,
Nick.

P.S. The ABI definition problem is at least somewhat manageable for
Windows and Mac OS X desktop/laptop environments (since you can mostly
pretend that architectures other than x86_64 don't exist, with perhaps
some grudging concessions to the existence of 32-bit mode), but beyond
those two, things get very messy, very fast - identifying CPU
architectures, CPU operating modes and kernel syscall interfaces
correctly is still a hard problem in the Linux distribution space, and
they've been working at it a lot longer than we have (and that's
*without* getting into things like determining which vectorisation
instructions are available). Folks often try to "deal" with this
complexity by wishing it away, but the rise of aarch64 and IBM's
creation of the OpenPOWER Foundation is making the data centre space
interesting again, while in the mobile and embedded spaces it's ARM
that is the default, with x86_64 attempting to make inroads.

As a result of all that, distributing software that uses dynamically
linked dependencies is genuinely difficult, to the point where even
operating system vendors struggle to get it right. This is why
"statically link all the things" keeps coming back in various guises
(whether that's Go's lack of dynamic linking support, or the surge in
adoption of Linux containers as a deployment technique), despite the
fact that these techniques inevitably bring back the *other* problems
that led to the invention of dynamic linking in the first place.

The only solution that is known to work reliably for dynamic linking
is to have a curated set of packages all built by the same build
system, so you know they're using consistent build settings. Linux
distributions provide this, as do multi-OS platforms like nix and
conda. We *might* be able to provide it for Python someday if PyPI
ever gets an integrated build farm, but that's still a big "if" at
this point.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list