[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Robert Collins robertc at robertcollins.net
Mon Oct 12 06:06:45 CEST 2015


EWOW, huge thread.

I've read nearly all of it but in order not to make it massively
worse, I'm going to reply to all the points I think need raising in
one mail :).

Top level thoughts here, more point fashion with only rough editing
below the fold.

I realise many things - like the issue between different wheels of the
same package consuming different numpy abis - have been touched on,
but AFAICT they are entirely orthogonal to the proposal, which was to
solve 'be able to use arbitrary build systems and still install with
pip'.

Of the actual problems with using arbitrary build systems, 99% of them
seem to boil down to 'setup-requires isn't introspectable by pip
(https://github.com/pypa/pip/issues/1820 ). - If it was, then
alternative build systems could be depended on reasonably; and the
mooted thunk from setuptools CLI to arbitrary build system would be
viable.

It is, in principle a matter of one patch to teach pip *a* way to do
this (and then any and all build systems that want to can utilise it).
https://github.com/rbtcollins/pip/tree/declarative is a POC I did - my
next steps on that were to discuss the right ecosystem stuff for it -
e.g. should pip consume it via setuptools, or should pip support it as
*the way* and other systems including setuptools can choose to use it?

A related but separate thing is being able to *exclusively* install
things without setuptools present - I've filed
https://github.com/pypa/pip/issues/3175 about that, but I think its
-much- lower priority than reliably enabling third party build tools.

-Rob

----


"
solved many of the hard problems here -- e.g. it's no longer necessary
that a build system also know about every possible installation
configuration -- so pretty much all we really need from a build system
is that it have some way to spit out standard-compliant wheels.
"

Actually pip still punts a *lot* here - we have bypasses to let things
like C compiler flags be set during wheel build, and when thats done
we don't cache the wheels (or even try to build wheels).

"
While ``distutils`` / ``setuptools`` have taken us a long way, they
suffer from three serious problems: ...
(c) you are forced to use them anyway, because they provide the
standard interface for installing python packages expected by both
users and installation tools like ``pip``."

I don't understand the claim of (c) here - its entirely possible to
write a package that doesn't use setuptools and have it do the right
thing - pip uses a subprocess to drive package installation, and the
interface is documented. The interface might be fugly as, but it
exists and works. It is missing setup-requires handling, but so is
setup.py itself. The only thing we'd really need to do AFAICT is make
our setuptools monkeypatching thunk handle setuptools not being
installed (which would be a sensible thing to Just Do anyhow).

"
- query for build dependencies
- run a build, producing wheels as output
- set up the current source tree so that it can be placed on
  ``sys.path`` in "develop mode"
"

So we have that already. setup.py egg-info, setup.py bdist_wheel,
setup.py develop.

"A version 1-or-greater format source tree can be identified by the
presence of a file ``_pypackage/_pypackage.cfg``.
"

I really don't like this. Its going to be with us forever, and its
intrusive (its visible), and so far isn't shown to be fixing anything.


"to scatter files around willy-nilly never works, so we adopt the
convention that names starting with an underscore are reserved for
official use, and non-underscored names are available for
idiosyncratic use by individual projects."

I can see the motivation here, but is it really solving a problem we have?


On the specifics of the format: I don't want to kibbitz over strawman
aspects at this point.

Having the extension mechanism be both pip specific and in Python
means that we're going to face significant adoption issues: the former
because pip is not by any means the only thing around - and some
distros have until very recently been actively hostile to pip (which
in turn means we need to wait a decade or two for them to age-out and
stop being used). The latter because we'll face all the headaches of
running arbitrary untrusted code and dealing with two deps with
different versions of the same hook and so on: I think its an
intrinsically unsafe design.

@dstufft "problem with numpy.distutils, as I know you’re aware!). We
could do a minimal extension and add another defacto-ish standard of
allowing pip and setuptools to process additional setup_requires like
arguments from a setup.cfg to solve that problem though. The flip side
to this is that since it involves new capabilities in
pip/setuptools/any other installer is that it you’ll have several
years until you can depend on setup.cfg based setup_requires from
being able to be depended on.
"

Well. For *any* proposal that involves modifying pip, we have to
assume that all existing things keep working, and that anyone wanting
to utilise the new thing will have to either a) include a local
compatibility thunk, or b) error when being used from a too-old
toolchain. I don't think that should really be a factor in design
since its intrinsic to the quagmire.

"Longer term, I think the answer is sdist 2.0 which has proper
metadata inside of it (name, version, dependencies, etc) but which
also includes a hook like this PEP has to specify the build system
that should be used to build a wheel out of this source distribution."

Any reason that can't just be setup.cfg ?

@Daniel "I thought Robert Collins had a working setup-requires
implementation already? I have a worse but backwards compatible one
too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py" -
https://github.com/rbtcollins/pip/tree/declarative - I'll be updating
that probably early next year at this rate - after issue-988 anyhow.
The issue with your approach is that pip doesn't handle having
concurrent installs done well - and in fact it will end up locking its
environment somehow.

@Paul "
I can understand that a binary wheel may need a certain set of
libraries installed - but that's about the platform tags that are part
of the wheel definition, not about dependencies. Platform tags are an
ongoing discussion, and a good example of a partial solution that" -
thats where the draft PEP tennessee and I start is aimed - at making
those libraries be metadata, not platform tags.

@Chris "
A given package might depend on numpy, as you say, and it may work
with all numpy versions 1.6 to 1.9. Fine, so we specify that in
install_requires. And this shodl be the dependency in the sdist, too.
If the package is pur python, this is fine and done.

But if the package has some extensions code that used the numpy C API
( a very common occurrence), then when it is built, it will only work
(reliably) with the version of numpy it was built with.

So the project itself, and the sdist depend on numpy >=1.6, but a
build binary wheel depends on numpy == 1.7 (for instance).

Which requires a binary (wheel) dependency that is somewhat different
than the source dependency.
" - so yes, that is where bdist_wheel should be creating different
metadata for that wheel. The issue that arises is that we need unique
file names so that they can coexist on PyPI or local archives - which
is where wheel tags come in. I'd be in favour of not using semantic
tags for this - rather hash the deps or something and just make a
unique file name. Use actual metadata for metadata.

@Nathaniel "I know that one unpleasant aspect of the current design is that the
split between egg-info and actual building creates the possibility for
time-of-definition-to-time-of-use bugs, where the final wheel
hopefully matches what egg-info said it would, but in practice there
could be skew. (Of course this is true in any system which represents"
- actually see https://bugs.launchpad.net/pbr/+bug/1502692 for a bug
where this 'skew' is desirable: for older environments we want
tailored deps with no markers, for anything supporting markers we want
them - so the wheel will have markers and egg_info won't.

@Nathaniel "
(Part of the intuition for the last part is that we also have a
not-terribly-secret-conspiracy here for writing a PEP to get Linux
wheels onto PyPI and at least achieve feature parity with Windows / OS
X. Obviously there will always be weird platforms -- iOS and FreeBSD
and Linux-without-glibc and ... -- but this should dramatically reduce
the frequency with which people need sdist dependencies.)" - I think a
distinction between sdist and binary names for dependencies would be a
terrible mistake. It will raise complexity for reasoning and
describing things without solving any concrete problem that I can see.

@Nathaniel "I guess to make progress in this conversation I need some
more detailed explanations. I totally get that there's a long history
of thought and conversations behind the various assertions here like
"a sdist is fundamentally different from a VCS checkout", "there must
be a 1-1 mapping between sdists and wheels", "pip needs sdists that
have full wheel metadata in static form", and I'm barging in from the
outside with no context, but I literally have no idea why the specific
design features you're asking for are desirable or even viable. Right
now if I were to try and write the PEP you're asking for, then the
rationale section would just be "because Donald said so" over and over
:-). I couldn't write the motivation section, because I don't know any
problems that the PEP you're describing would fix for me as a package
author (which doesn't mean they don't exist, but!)." -- VCS trees are
(generally) by-humans for humans. They are the primary source of data
and can do thinks like inferring versions from commit data. sdists are
derived from the VCS tree and can include extra data (such as
statically defined version data). Wheels are derived from a tree on
disk and can (today) be built from either VCS trees or sdists. I'm not
sure that forcing an sdist step is beneficial - the egg-info step we
have today is basically that without the cost of compressing and
decompressing potentially large trees for no reason.

@Jeremy "An sdist is an installable package which just happens to _look_ a
lot like a source release tarball, but trying to pretend that
downstream packagers will want to use it as such leads to a variety
of pain points in the upstream/downstream relationship. For better
or worse a lot of distros don't want generated files in upstream
source code releases, since they need to confirm that they also ship
the necessary tooling to regenerate any required files and that the
generated files they ship match what their packaged tooling
produces." - Well, pbr doesn't work if you just tar up or git export
your VCS tree: it requires the chance to add metadata. And while
distros have whinged about pbr in a number of contexts, that hasn't
been one so far. Downstreams are pretty used to receiving tarballs
with generated files in them - as long as they *have the option* to
recreate those, so the source material isn't lost. [And for version
data, 'grab from git' is a valid answer there']. OTOH perhaps
ftpmaster just hasn't noticed and we're about to get a bug report ;)


More information about the Distutils-SIG mailing list