[Distutils] PEP 527 - Removing Un(der)used file types/extensions on PyPI

Donald Stufft donald at stufft.io
Thu Aug 25 13:12:18 EDT 2016


> On Aug 25, 2016, at 12:43 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
> Aside: If pip is considered the only user of PyPI, I do wonder,
> why we bother having a user readable download index at all ;-)
> 


It’s not considered the *only* user, but it is the primary user of
the files that get uploaded to PyPI because, well it is the primary
thing downloading from PyPI by a significant margin. Here’s some
data for the last 30 days:

Total Downloads: 543,475,640

1.  pip:            346,682,962 (64%)
2.  bandersnatch:   112,405,165 (21%)
3.  null:           40,803,919  (8%)
4.  setuptools:     31,638,840  (6%)
5.  distribute:     6,661,875   (1%)
6.  Browser:        2,247,006   (0.4%)
7.  requests:       1,296,017   (0.3%)
8.  pex:            736,028     (0.2%)
9.  devpi:          357,957     (0.06%)
10. z3c.pypimirror: 294,985     (0.05%)
11. Artifactory:    169,241     (0.03%)
12. OS:             103,323     (0.02%)
13. Homebrew:       42,252      (0.01%)
14. conda:          36,070      (0.01%)

The bandersnatch downloads also likely also represents largely pip
downloads (or zero downloads) and are very unlikely to represent
manual downloads. In here null is basically anything we couldn’t
figure out from our UA parsing what it was, generally that means
older pip’s where we used the default urllib2 user agent but it
could be people with custom urllib2 based scripts or something. Also
things like curl/wget and such get rolled into “Browser”.

This data is all available for anyone to query in BigQuery if they
have a google account. One of the things I generally try to do is
make data based decisions with PyPI/pip/etc which is why I’ve put
a fair amount of effort into creating a data pipeline that we can
query and ask questions of instead of trying to guess. Of course,
no data is perfect so some amount of guessing is unavoidable, but
at least we can make educated guesses. In this, the data unequivocally
shows that automatic downloads through some sort of tooling is the
primary use case for files uploaded to PyPI, and thus decisions
should be made to optimize that particular use case.

Given all of that, the new PyPI design de-emphasizes the downloads
in the UI (without removing it) while trying to put a higher emphasis
on the links provided by the project itself (and of course, it’s
description). If we were to add some type of file where the primary
purpose was for users to manually download it, rather than some tool,
then we’d want to rework that so that we separated these two different
kinds of files, the ones aimed mostly at tooling vs the ones aimed
mostly at humans, and properly emphasis them.

—
Donald Stufft





More information about the Distutils-SIG mailing list