[Catalog-sig] What is the point of pythonpackages.com?

Wed Feb 8 14:21:20 CET 2012

Hi there,

On Wed, Feb 8, 2012 at 1:21 AM,  <martin at v.loewis.de> wrote:
[snip]
> Why not? People certainly use it as an archive - and the same people
> suggesting
> that PyPI should host all files also insist that old releases should never
> be deleted from PyPI (making it an archive).

There are various semantics of the word "archive" involved.

One way to see an archive is as a repository of out-of-date historical
information. Its only interest is historical. Broken links are
acceptable in that.

Another way to see an archive is as a repository of information that
is current. While some bits (releases) were added to the archive long
ago, they still see active use every day.

(There's a grey area. Python 1.5.2 is historical to most of us, but is
undoubtedly still in active use somewhere. If it were to disappear
from python.org the people complaining about it would have a
legitimate complaint: it's unnecessarily hurting its users to do so, I
think, for little gain to the python.org maintainers. But for most of
us the 1.5.2 download page is of historical value only.)

PyPI is both: it's both an archive of historical information and
that's why links in its metadata and documentation should be allowed
to remain even when the outside world has changed and they are broken,
and it's a repository of current information, where we want the
metadata and in particular the *releases* to remain available.

If an archive contained no links to the outside world at all, an
archive (active action to modify the archive disregarded) would
automatically be both historical and current. But PyPI does contain
links, and in particular links to releases.

The thing that brings tension between the two uses of PyPI is that
releases are, in the "repository of current information" sense, more
like metadata than like links. PyPI retains old metadata, but *links*
to releases can break. So if a release is uploaded to PyPI, the
release will remain (unless active action is taken), and this
permanence is under control of PyPI, just like that of metadata.  If a
*link* is uploaded to PyPI for indicating releases, it can only be
maintained by PyPI in the "historical archive" sense; it might be up
to date or might be outdated, and PyPI cannot help it or control it.

If PyPI *only* contained links to releases and didn't contain releases
itself, we would either not have the automatic download tools we have
now, or we'd have cache or repository technology to make sure that
releases *can* be reliably accessed to reduce the points of failure.
If PyPI *only* contained releases and no links to releases, we'd not
be having this discussion (we'd only have the discussion about people
actively removing old releases). But PyPI does both and that's what is
creating complexity.

I think the best way out would be for caching technology for active
releases to find active use (if the license information in the PyPI
metadata allows such caching). This is a technical solution that can
be worked on independently.

Regards,

Martijn