[Distutils] PEP 426 moved back to Draft status

Fri Mar 10 10:55:49 EST 2017

On 11 March 2017 at 00:52, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Mar 10, 2017 at 1:26 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > Hi folks,
> >
> > After a few years of dormancy, I've finally moved the metadata 2.0
> > specification back to Draft status:
> > https://github.com/python/peps/commit/8ae8b612d4ea8b3bf5d8a7b795ae8a
> ec48bbb7a3
>
> We have lots of metadata files in the wild that already claim to be
> version 2.0. If you're reviving this I think you might need to change
> the version number?
>

They're mostly in metadata.json files, though. That said, version numbers
are cheap, so I'm happy to skip straight to 3.0 if folks think it makes
more sense.

> > Based on our last round of discussion, I've culled a lot of the
> complexity
> > around dependency declarations, cutting it back to just 4 pre-declared
> > extras (dev, doc, build, test),
>
> I think we can drop 'build' in favor of pyproject.toml?
>

No, as that's a human edited input file, not an output file from the sdist
generation process.

> Actually all of the pre-declared extras are really relevant for sdists
> rather than wheels. Maybe they should all move into pyproject.toml?
>

Think "static release metadata in an API response from PyPI" for this
particular specification, rather than something you'd necessarily check
into source control. That's actually one of the big benefits of doing this
post pyproject.toml -  with that taking care of the build system
bootstrapping problem, it frees up pydist.json to be entirely an artifact
of the sdist generation process (and then copying it along to the wheel
archives and the installed package as well).

That said, that's actually an important open question: is pydist.json
always preserved unmodified through the sdist->wheel->install and
sdist->install process?

There's a lot to be said for treating the file as immutable, and instead
adding *other* metadata files as a component moves through the distribution
process. If so, then it may actually be more appropriate to call the
rendered file "pysdist.json", since it contains the sdist metadata
specifically, rather than arbitrary distribution metadata.

>
> > and some reserved extras that can be used to
> > say "don't install this, even though you normally would" (self, runtime).
>
> Hmm. While it's not the most urgent problem we face, I really think in
> the long run we need to move the extras system to something like:
>
>     https://mail.python.org/pipermail/distutils-sig/2015-
> October/027364.html
>
> The current extras system is inherently broken with respect to
> upgrades, and reified extras would solve this, along with several
> other intractable problems (e.g. numpy ABI tracking).
>
> So from that perspective, I'm wary of adding new special case "magic"
> to the extras system. Adding conventional names for things like
> test-dependencies is fine, that doesn't pose any new obstacles to a
> future migration. But adding complexity to the "extras language" like
> "*", "self", "runtime", etc. does make it harder to change how extras
> work in the future.
>

Technically the only part of that which the PEP really locks in is barring
the use of "self" and "runtime" as extras names (which needs to be
validated by a check against currently published metadata to see if anyone
is already using them).

'*' is already illegal due to the naming rules, and the '-extra' syntax is
also an illegal name, so neither of those actually impacts the metadata
format, only what installation tools allow. The main purpose of having them
in the PEP is to disallow using those spellings for anything else and
instead reserve them for the purposes described in the PEP.

I'd also be fairly strongly opposed to converting extras from an optional
dependency management system to a "let multiple PyPI packages target the
same site-packages subdirectory" because we already know that's a nightmare
from the Linux distro experience (having a clear "main" package that owns
the parent directory with optional subpackages solves *some* of the
problems, but my main reaction is still "Run awaaay").

It especially isn't needed just to solve the "pip forgets what extras it
installed" problem - that technically doesn't even need a PEP to resolve,
it just needs pip to drop a pip specific file into the PEP 376 dist-info
directory that says what extras to request when doing future upgrades.
Similarly, the import system offers so much flexibility in checking for
optional packages at startup and lying about where imports are coming from
that it would be hard to convince me that installation customisation to use
particular optional dependencies *had* to be done at install time.

> I feel like most of the value we get out of these could be had by just
> standardizing the existing convention that packages should have an
> explicit "all" extra that includes all the feature-based extras,

That's the first I've heard of that convention, so it may not be as
widespread as you thought it was :)

> but
> not the special development extras? This also provides flexibility for
> cases like, a package where there are two extras that conflict with
> each other -- the package authors can pick which one they recommend to
> put into "all".
>

That's actually the main problem I had with '*' - it didn't work anywhere
near as nicely once the semantic dependencies were migrated over to being
part of the extras system.

Repeating the same dependencies under multiple extra names in order to
model pseudo-sets seems error prone and messy to me, though.

So perhaps we should add the notion of "extra_sets" as a first class
entity, where they're named sets of declared extras? And if you don't
declare an "all" set explicitly, you get an implied one that consists of
all your declared extras.

For migration of existing metadata that uses "all" as a normal extra, the
translation would be:

- declared extras are added to "all" in order until all of the dependencies
in all are covered or all declared extras are included
- any dependency in "all" that isn't in another extra gets added to a new
"_all" extra
- "extras" and "extra_sets" are populated accordingly

Tools consuming the metadata would then just need to read "extra_sets" and
expand any named sets before passing the list of extras over to their
existing dependency processing machinery.

> I've also deleted a lot of the text related to thing that we now don't
> need
> > to worry about until the first few standard metadata extensions are being
> > defined.
> >
> > I think the biggest thing it needs right now is a major editing pass from
> > someone that isn't me to help figure out which explanatory sections can
> be
> > culled completely, while still having the specification itself make
> sense.
> >
> > From a technical point of view, the main "different from today" piece
> that
> > we have left is the Provide & Obsoleted-By fields, and I'm seriously
> > wondering if it might make sense to just delete those entirely for now,
> and
> > reconsider them later as a potential metadata extension.
>
> Overall the vibe I get from the Provides and Obsoleted-By sections is
> that these are surprisingly complicated and could really do with their
> own PEP, yeah, where the spec will have room to breathe and properly
> cover all the details.
>
> In particular, the language in the "provides" spec about how the
> interpretation of the metadata depends on whether you get it from a
> public index server versus somewhere else makes me really nervous.
>

Yeah, virtual provides are a security nightmare on a public index server -
distros are only able to get away with it because they maintain relatively
strict control over the package review process.

> Experience suggests that splitting up packaging PEPs is basically
> never a bad idea, right? :-)
>

Indeed :)

OK, I'll put them on the chopping block too, under the assumption they may
come back as an extension some day if it ever makes it to the top of
someone's list of "thing that bothers them enough about Python packaging to
do something about it".

> As a general note I guess I should say that I'm still not convinced
> that migrating to json is worth the effort, but you've heard those
> arguments before and I don't have anything new to add now, so :-).
>

The main benefit I see will be to empower utility APIs like distlib (and
potentially Warehouse itself) to better hide both the historical and
migratory cruft by translating everything to the PEP 426 format, even if
the source artifact only includes the legacy metadata. Unless the plumbing
actually breaks, nobody other than the plumber cares when it's a mess, as
long as the porcelain is shiny and clean :)

Cheers,
Nick.

P.S. Something I'm getting out of this experience: if you can afford to sit
on your hands for 3-4 years, that's a *really good way* to avoid falling
prey to "second system syndrome" [1] :)

P.P.S Having no budget to pay anyone else and only limited time and
attention of your own also turns out to make it easier to avoid ;)

[1] http://coliveira.net/software/what-is-second-system-syndrome/

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170311/951fed53/attachment-0001.html>