[Distutils] PEP 527 - Removing Un(der)used file types/extensions on PyPI

Donald Stufft donald at stufft.io
Tue Aug 23 15:57:39 EDT 2016


> On Aug 23, 2016, at 3:03 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
> On 23.08.2016 18:46, Donald Stufft wrote:
>> Since it seemed like there was enough here for a proper PEP I went ahead and
>> write one up, which is now PEP 527. The tl;dr of it is that:
>> 
>> * Everything but sdist, bdist_wheel, and bdist_egg get deprecated.
> 
> -1 on removing bdist_wininst and bdist_msi. If PyPI is supposed
> to retain the status of the main website to go search for Python
> package downloads, it needs to be able to provide ways of hosting
> all distribution types which are supported by distutils, including
> ones which target platform configuration management system such as
> the Windows one.

I could not disagree more that PyPI needs to support hosting all
distribution types that distutils supports. For one, we’re quickly
getting to a place where distutils is no longer special, so what
distutils does is just an implementation detail of one particular
tool. But more importantly, these formats are not all equally useful,
to take a look at the most obvious one, we have bdist_dumb which
creates a tarball with zero metadata and hardcoded paths like:

./Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/cryptography/__about__.py
./Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/cryptography/__pycache__/__about__.cpython-35.pyc

If I uploaded this file, it is useful to basically nobody except
people who happened to have the username ‘dstufft’, on macOS and
are using pyenv and they’ve installed CPython 3.5.2. However, not
only is it useless it’s actively harmful, because a bdist_dumb has
a filename that is also a valid sdist filename if you’re using the
legacy version parser (which we have to do for backwards
compatibility). This ends up with installers needing to support weird
hacks to actually interact with PyPI instead of being able to handle
it in a much smoother and simpler way. For example, we have gross
hacks like:

    
    if "macosx10" in link.path and ext == '.zip':
        self._log_skipped_link(link, 'macosx10 one')
        return

To try and work around cases where we found popular packages shipping
a bdist_dumb that we couldn’t differentiate from a sdist.

Now, maybe you agree with me, bdist_dumb is well, dumb and we could
probably do without it, but if you do then you also agree with me
that the premise that “PyPI must support everything distutils does”
is flawed and we can then discuss which formats make sense to support
and which don’t.

> 
> The number of downloads is really irrelevant for this kind of
> argument. Since the PEP proposes to keep the existing uploads
> around, I also don't follow the argument of reduced maintenance.
> PyPI will still have to host and support downloading those file
> types.

Continuing to support a file that already exists is trivial, as far
as PyPI is concerned at that point it’s nothing more than a binary
blob that is accessible at a particular URL. However, PyPI does need
to do work when a file is uploaded to PyPI. For instance, it needs
to verify that the file being uploaded is valid, it needs to ensure
that it’s for the project it claims to be for, etc. To do this, PyPI
has to know things about the file format itself, and what it can
expect from it. One bug that has cropped up from time to time again
is people accidentally uploading a package that inside it contains
version say “1.0”, but when they registered it with PyPI they told
PyPI it was version “1.0a1” or something like that, which causes a lot
of the tooling to do subtly weird and broken things. PyPI should be
double checking the internal metadata of these files, but it can’t
do that unless it can expect that metadata to exist in those files
and it has to implement it for each file type (and then, that has to
be maintained).

The number of downloads *is* relevant though, because it allows us
to gauge how many people are utilizing these files to see what the
breakage looks like to try and make a decision about where my time
is spent. If very few people are utilizing something, then it’s very
likely not worth my time to try and continue to support it (and I
don’t see anyone else leaping forward to commit to maintaining support
for these things in PyPI).

> 
> To me, all this sounds a lot like eventually turning PyPI into a
> pip package index, which no longer serves the original intent of
> a Python package index. I think that's taking a wrong turn in the
> development of such an index.

Considering PyPI didn’t originally allow uploading files at all, I
don’t see how disallowing uploading some files is somehow breaking
the original intent. That being said, PyPI has *two* sort of related
functionalities, one is to allow people to discover projects, which
has nothing to do with uploading files (and indeed, originally this
was all PyPI did, there was no concept of file downloads) and for that
nothing is changing, the other side of that is of a repository for
pip/setuptools/etc and for that, you need file uploads and downloads.
Nobody is stopping anyone from linking to any sort of file they want
to in their project description or in their project URLs.

> 
> IMO, we should aim to reunite separate indexes such as the
> one used for conda or the win32 index maintained by
> Christoph Golke back into PyPI, not create even more
> separation by removing platform specific formats.

I have zero interest in turning PyPI into a conda, rpm, deb, or any
other system level package manager index and it would only be over
my very strenuous objection that such a thing ever got added. It is
not appropriate for PyPI at all.

As far as Christoph’s work, there’s nothing that he’s doing that
couldn’t exist on PyPI today *except* that he is not the author of
those packages and thus is providing unofficial, third party binaries,
which again is not appropriate for on PyPI itself.

> 
>> * The only allowed extension for sdist is ``.tar.gz``.
> 
> Strong -1 on this part. .tar.gz may be a good choice for Unix,
> but it definitely isn't for Windows. Even for Unix, .zip files
> have the advantage of not messing up file ownerships and
> permissions.

Both .tar.gz and .zip have advantages and disadvantages, it’s trivial
to sit there and go back and forth about one or the other. However,
having > 1 extensions makes PyPI and pip’s job harder so we should
pick just one and standardize on that. I think it should be .tar.gz
because anything else is a larger change for no real benefit.

—
Donald Stufft





More information about the Distutils-SIG mailing list