[Distutils] PEP 439 and pip bootstrap updated

Sat Jul 13 01:14:05 CEST 2013

Donald Stufft <donald <at> stufft.io> writes:

> I could probably be convinced about something that makes handling versions
> easier going into the standard lib, but that's about it.

That seems completely arbitrary to me. Why just versions? Why not, for
example, support for the wheel format? Why not agreed metadata formats?

> There's a few reasons that I don't want these things added to the stdlib
> themselves.
> 
> One of the major ones is that of "agility". We've seen with distutils how
> impossible it can be to make improvements to the system. Now some of this

You say that, but setuptools, the poster child of packaging, improved quite
a lot on distutils. I'm not convinced that it would have been as successful
if there were no distutils in the stdlib, but of course you may disagree.

I'm well aware of the "the stdlib is where software goes to die" school of
thought, and I have considerable sympathy for where it's coming from, but
let's not throw the baby out with the bathwater. The agility argument could
be made for lots of areas of functionality, to the point where you just
basically never add anything new to the stdlib because you're worried about
an inability to cope with change. Also, it doesn't seem right to point to
particular parts of the stdlib which were hard to adapt to changing
requirements and draw the conclusion that all software added to the stdlib
would be equally hard to adapt. Of course one could look at a specific piece
of software and assess its adaptability, but otherwise, isn't it verging on
just arm-waving?

> is made better with the way the new system is being designed  with versioned
> metadata but it doesn't completely go away. We can look at Python's past to
> see just how long any individual version sticks around and we can assume that
> if something gets added now that particular version will be around for a long
> time.

That doesn't mean that overall improvements can't take place in the stdlib.
For example, getopt -> optparse -> argparse.

> Another is because of how long it can take a new version of Python to become
> "standard", especially in the 3.x series since the entire 3.x series itself
> isn't standard, any changes made to the standard lib won't be usable for
> years and years. This can be mitigated by releasing a backport on PyPI, but
> if every version of Python but the latest one is going to require installing
> these libs from PyPI in order to usefully interact with the "world", then you
> might as well just require all versions of Python to install bits from PyPI.

Well, other approaches have been looked at - for example, accepting things
into the stdlib but warning users about the provisional nature of some APIs.

I think that where interoperability between different packaging tools is
needed, that's where the argument for something in the stdlib is strongest,
as Brett said.

> Yet another is by blessing a particular implementation, that implementations
> behaviors become the standard (indeed the way the PEP system generally works
> for this is once it's been added to the standard lib the PEP is a historical
> document and the documentation becomes the standard). However packaging is

That's because the PEP is needed to advocate the inclusion in the stdlib and
as a record of the discussion and rationale for accepting/rejecting whatever
was advocated, but there's no real benefit in keeping the PEP updated as the
stdlib component gets refined from its real-world exposure through being in
the stdlib.

> not like Enums or urllibs, or smtp. We are essentially defining a protocol,
> one that non Python tools will be expected to use (for Debian and RPMs for
> example). We are using these PEPs more like a RFC than a proposal to include
> something in the stdlib.

But we can assume that there will either be N different implementations of
everything in the RFCs from the ground up, by N different tools, or ideally
one canonical implementation in the stdlib that the tool makers can use (but
are not forced to use if they don't want to). You might say that if there
were some kick-ass implementation of these RFCs on PyPI people would just
gravitate to it and the winner would be obvious, but I don't see things
working like that. In the web space, look at HTTP Request/Response objects
as an example: Pyramid, Werkzeug, Django all have their own, don't really
interoperate in practice (though it was a goal of WSGI), and there's very
little to choose between them technically. Just a fair amount of duplicated
effort on something so low-level, which would have been better spent on
truly differentiating features.

> There's also the case of usefulness. You mention some code that can parse the
> JSON metadata and validate it. Weel assumingly we'll have the metadata for
> 2.0 set down by the time 3.4 comes around. So sure 3.4 could have that, but
> then maybe we release metadata 2.1 and now 3.4 can only parse _some_ of the
> metadata. Maybe we release a metadata 3.0 and now it can't parse any
> metadata. But even if it can parse the metadata what does it do with it? The
> major places you'd be validating the metadata (other than merely consuming
> it) is either on the tools that create packages or in PyPI performing checks
> on a valid file upload. In the build tool case they are going to either need
> to write their own code for actually creating the package or, more likely,
> they'll reuse something like distlib. If those tools are already going to be
> using a distlib-like library then we might as just keep the validation code
> in there.

Is that some blessed-by-being-in-the-stdlib kind of library that everyone
uses, or one of several balkanised versions a la HTTP Request / Response? If
it's not somehow blessed, why should a particular packaging project use it,
even if it's technically up to the job?

> Now the version parsing stuff which I said I could be convinced is slightly
> different. It is really sort of it's own thing. It's not dependent on the
> other pieces of packaging to be useful, and it's not versioned. It's also the
> only bit that's really useful on it's own. People consuming the (future) PyPI
> API could use it to fully depict the actual metadata so it's kind of like
> JSON itself in that regard.

That's only because some effort has gone into looking at version
comparisons, ordering, pre-/post-/dev-releases, etc. and considering the
requirements in some detail. It looks OK now, but so did PEP 386 to many
people who hadn't considered the ordering of dev versions of
pre-/post-releases. Who's to say that some other issue won't come up that we
haven't considered? It's not a reason for doing nothing.

> The installer side of things the purist side of me doesn't like adding it to
> the standard library for all the same reasons but the pragmatic side of me
> wants it there because it enables fetching the other bits that are needed for
> "pip install X" to be a reasonable official response to these kind of
> questions. But I pushed for and still believe that if a prerequisite for
> doing that involves "locking" in pip or any of it's dependencies by adding
> them to the standard library then I am vehemently against doing it.

Nobody seems to be suggesting doing that, though.

Regards,

Vinay Sajip