[Distutils] Outdated packages on pypi

Nick Coghlan ncoghlan at gmail.com
Sat Jul 23 00:20:41 EDT 2016


[Good replies from Donald, Paul, et al already, but rather than
replying to individual points, I figure it's best to just respond to
Chris's original question with my own thoughts]

On 23 July 2016 at 01:47, Chris Barker - NOAA Federal
<chris.barker at noaa.gov> wrote:
> Right now, the barrier to entry to putting a package up on PyPI is
> very, very low. There are a lot of 'toy', or 'experimental', or 'well
> intentioned but abandoned' projects on there.
>
> If there was a clear and obvious place for folks to put these packages
> while they work out the kinks, and determine whether the package is
> really going to fly, I think people would use it.

That place is PyPI. Having a separate "maybe good, maybe bad" location
for experimental packages (which is the way a lot of people use GitHub
repos these days, relying on direct-from-VCS installs) leads to a
persistent problem where folks later decide "I want to publish this
officially", go to claim the name on PyPI as well and find they have a
conflict. As further examples of similar "multiple authoritative
namespaces" problems, we sometimes see folks creating Python projects
specifically for Linux distros rather than upstream Python causing
name collisions when the upstream project is later packaged for that
distro (e.g. python-mock conflicting with Fedora's RPM "mock" build
tool - resolved by the Fedora library being renamed to "mockbuild"
while keeping "mock" as the CLI name), and we also see cross-ecosystem
conflicts (e.g. python-pip and perl-pip conflicting on the "pip" CLI
name, resolved by the good graces of the perl-pip maintainer in ceding
the unqualified name to the Python package).

You can also look at the number of semantically versioned packages
that enjoy huge adoption even before the author(s) pull the trigger on
a 1.0 release (e.g. requests, SQL Alchemy, the TOML spec off the top
of my head), which reveals that package authors often have higher
standards for "good enough for 1.0" than their prospective users do
(the standard for users is generally "it's good enough to solve my
current problem", while the standard for maintainers is more likely to
be "it isn't a hellish nightmare to maintain as people start finding
more corner cases that didn't previously occur to me/us").

The other instinctive answer is "namespaces solve this!", but the
truth is they don't, as:

1. Namespaces tend to belong to organisations, and particularly for
utility projects, "this utility happened to be developed by this
organisation" is entirely arbitrary and mostly irrelevant (except
insofar as if you trust a particular org you may trust their libraries
more, but you can get that from the metadata).
2. If you want to build a genuinely inclusive open source project,
branding it with the name of your company is one of the *worst* things
you can do (since it prevents any chance of a feeling of shared
ownership by the entire community)
3. Python already allows distribution package names to differ from
import package names, as well as supporting namespace packages, which
means folks *could* have adopted namespaces-by-convention if they were
a genuinely compelling solution. That hasn't happened, which suggests
there's an inherent user experience problem with the idea.

Would requests be more discoverable if Kenneth had called it
"kreitz-requests" instead? What if we had "org-pocoo-flask" instead of
just plain "flask"? Or "ljworld-django"? How many folks haven't even
looked at "zc.interface" because the association with Zope prompts
them to dismiss it out of hand?

Organisational namespaces on a service like GitHub are *absolutely*
useful, but what they're replacing is the model where different
organisations run their own version control server (just as different
organisations may run their own private Python index today), rather
than being a good thing to expose directly to end users that just want
to locate and use a piece of software.

> However, in these discussions, I've observed a common theme: folks in
> the community bring up issues about unmaintained packages, namespace
> pollution, etc. the core PyPA folks respond with generally well
> reasoned arguments why proposed solutions won't fly.
>
> But it's totally unclear to me whether the core devs don't think these
> are problems worth addressing, or think they can only be addresses
> with major effort that no one has time for.

If we accept my premise that "single global namespace, flat except by
convention" really does offer the most attractive overall user
experience for a software distribution ecosystem, what's missing from
the status quo on PyPI?

In a word? Gardening.

Post-publication curation like that provided by Linux distros and
other redistributor communities (including conda-forge) can help with
the "What's worth my time and attention?" question (we can think of
this as filtering the output of an orchard, and only passing along the
best fruit), but it can't address the problem of undesirable name
collisions and other problems on the main index itself (we can think
of this as actually weeding the original orchard, and pruning the
trees when necessary)

However, we don't have anyone that's specifically responsible for
looking after the shared orchard that is PyPI, and this isn't
something we can reasonably ask volunteers to do, as it's an intensely
political and emotional draining task where your primary
responsibility is deciding if and when it's appropriate to *take
people's toys away*. As an added bonus, becoming more active in
content curation as a platform provider means potentially opening
yourself up to lawsuits as well, if folks object to either your
reclamation of a name they previously controlled, or else your refusal
to reclaim a particular name for *their* purposes.

So while first-come-first-served namespace management definitely
doesn't provide the best possible user experience for Pythonistas, it
*does* minimise the volume of ongoing namespace maintenance work
required, as well as the PSF's exposure to legal liability as the
platform provider.

These concerns aren't something that "policy enforcement can be
automated" really addresses either, as even if you have algorithmic
enforcement, those "The robots have decided to take your project away
from you" emails are still going to be going to real people, and those
folks may still be understandably upset. "Our algorithms did it, not
our staff" is also a pretty thin legal reed to pin your hopes on if
somebody turns out to be upset enough to sue you about it.

This means the entire situation changes if the PSF's Packaging Working
Group receives sufficient funding (either directly through
https://donate.pypi.io or through general PSF sponsorships) to staff
at least one full-time "PyPI Publisher Support" role (in addition to
the other points on the WG's TODO list), as well as to pay for
analyses of the legal implications of having more formal content
curation policies.

However, in the absence of such ongoing funding support, the current
laissez-faire policies will necessarily remain in place, as they're
the only affordable option.

Regards,
Nick.

P.S. If folks work at end user organisations for whom contributing
something like $10k a year to the PSF for a Silver sponsorship
(https://www.python.org/psf/sponsorship/ ) would be a rounding error
in the annual budget want to see change in this kind of area, then
advocating within your organisation to become PSF sponsor members on
the basis of "We want to enable the PSF to work on community
infrastructure improvement activities it's currently not pursuing for
lack of resources" is probably one of *the most helpful things you can
do* for the wider Python community. As individuals, we can at most
sustainably contribute a few hours a week as volunteers, or maybe
wrangle a full-time upstream contribution role if we're lucky enough
to have or find a supportive employer. By contrast, the PSF is in a
position to look for ways to let volunteers focus their time and
energy on activities that are more inherently rewarding, by directing
funding towards the many necessary-but-draining activities that go
into supporting a collaborative software development community. The
more resources the PSF has available, the more time and energy it will
be able to devote towards identifying and addressing unwanted sources
of friction in the collaborative process.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list