[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Sat Oct 3 03:03:30 CEST 2015


On Fri, Oct 2, 2015 at 2:30 PM, Donald Stufft <donald at stufft.io> wrote:
> On October 2, 2015 at 4:20:00 PM, Nathaniel Smith (njs at pobox.com) wrote:
[...]
>> There are plenty of more
>> complex examples too (e.g. ones that involve build/configure-time
>> decisions about whether to rely on particular system libraries, or
>> build/configure-time decisions about whether particular packages
>> should even be built).
>
> I don't think build/configure-time decisions are great ideas as it's near
> impossible to actually depend on them. For example, take Pillow, Pillow will
> conditionally compile against libraries that enable it to much around with
> PNGs. However, if I *need* Pillow with PNG support, I don't have any mechanism
> to declare that. If instead, builds were *not* conditional and Pillow instead
> split it's PNG capabilities out into it's own package called say, Pillow-PNG
> which also did not conditionally compile against anything, but unconditionally
> did, then we could add in something like having Pillow declare a "weak"
> dependency on Pillow-PNG where we attempt to get it by default if possible, but
> we will skip installing it if we can't locate/build it. If you combine this
> with Extras, you could then easily make it so that people can depend on
> particular conditional features by doing something like ``Pillow[PNG]`` in
> their dependency metadata.

While I agree with the sentiment here, I don't think we can simply
unconditionally rule out build/configure-time decisions.

I gave an example in the other subthread of a numpy wheel, which
depending on build configuration might depend implicitly on the system
BLAS, might have BLAS statically linked, or might depend explicitly on
a "BLAS wheel". (And note that when configured to use a "BLAS wheel"
then this would actually be a build-dependency, not just a
runtime-dependency.) As far as downstream users are concerned, all of
these numpy wheels export exactly the same API -- how numpy finds BLAS
is just an internal detail. So in this case the problems your
paragraph above is worrying about just don't arise. And numpy
absolutely will need the option to be built in these different ways.

>>
>> For comparison, here's the Debian source package metadata:
>> https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-debiansourcecontrolfiles
>> Note that the only mandatory fields are format version / package name
>> / package version / maintainer / checksums. The closest they come to
>> making promises about the built packages are the Package-List and
>> Binary fields which provide a optional hint about what binary packages
>> will be built, and are allowed to contain lies (e.g. they explicitly
>> don't guarantee that all the binary packages named will actually be
>> produced on every architecture). The only kind of dependencies that a
>> source package can declare are build-depends.
>
> Debian doesn't really have "source packages" like we do, but inside of the
> debian/ directory is the control file which lists all of the dependency
> information (or explicitly lists a placeholder where something can't be
> statically declared).

Someone who is more of an expert on debian packaging can correct me if
I'm wrong, but I'm 99% sure that this is incorrect, and in an
important way.

The actual interface between a build tool like dpkg-buildpackage and a
source package is: (a) the .dsc file, with the required fields I
mentioned, (b) the debian/rules file, which is an opaque executable
that can be called to perform standard operations like "build" and
"clean" -- basically the moral equivalent of the hooks in our sdist
proposal.

The debian/control file does have a conventional format, but this is
just convention -- most-or-all debian/control scripts all use the same
set of tools to work with this file, and expect it to be in the same
place. But if, say, Debian decides that they need a new kind of
placeholder to handle a situation that hasn't arisen before, then
there's no need to change the definition of a source package: you just
add support for the new placeholder to the tools that work with this
file, and then packages that want to make use of the new placeholder
just have to Build-Depend on the latest version of those tools.

This is the idea motivating the sdist PEP's design: you can't specify
all of a source distribution's metadata statically, and then given
that you'll be specifying at least part of the metadata dynamically,
you want to do it in a way that you can change without having to do a
PEP and update pip etc.

>>
>> > To a similar tune, this PEP also doesn't make it possible to really get at
>> > any other metadata without executing software. This makes it pratically
>> > impossible to safely inspect an unknown or untrusted package to determine what
>> > it is and to get information about it. Right now PyPI relies on the uploading
>> > tool to send that information alongside of the file it is uploading, but
>> > honestly what it should be doing is extracting that information from within the
>> > file. This is sort of possible right now since distutils and setuptools both
>> > create a static metadata file within the source distribution, but we don't rely
>> > on that within PyPI because that information may or may not be accurate and may
>> > or may not exist. However the twine uploading tool *does* rely on that, and
>> > this PEP would break the ability for twine to upload a package without
>> > executing arbitrary code.
>>
>> Okay, what metadata do you need? We certainly could put name / version
>> kind of stuff in there. We left it out because we weren't sure what
>> was necessary and it's easy to add later, but anything that's needed
>> by twine fits neatly into the existing text saying that we should
>> "include extra metadata in source distributions if it helps solve
>> specific problems that are unique to distribution" -- twine uploads
>> definitely count.
>
> Everything that isn't specific to a built wheel. Look at the previously
> accepted metadata specs as well as PEP 426. If you're not including a field
> that was included in one of those, there should be a rationale for why that
> field is no longer being included.

The default rationale is just "let's keep our options open" -- it's
much easier to add than to subtract later.

In particular I hesitate a little bit to just drop in everything from
PEP 426 and friends, because previous specs haven't really thought
through the distinction between sdists and wheels -- e.g. if an sdist
generates two wheels, they probably won't have the same name,
description, trove classifiers, etc. They may not even have the same
version (e.g. if two different tools with existing numbering schemes
get merged into a single distribution -- esp. if one of them needs an
epoch marker). So it may well make sense to have an "sdist description
field", but it's not immediately obvious that it's identical to a
wheel's description field.

I mean, in practice it's probably no big deal -- a description is some
text for human readers, whatever, it's useful and it'll be fine. But
given that we can trivially add more fields to the pypackage.cfg file
later, and that current sdists don't have any of this metadata, I just
don't want to risk blocking progress on one axis (enabling better
build systems) while waiting to achieve maximal progress on another
mostly-orthogonal axis (having nice metadata in sdists for tools like
twine to take advantage of).

Bottom line: If after further discussion we reach the point where the
only thing blocking this is the addition of name and description and
trove classifier fields, then of course we'll just add those to the
PEP :-).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org


More information about the Distutils-SIG mailing list