[Catalog-sig] Deprecate External Links

Donald Stufft donald.stufft at gmail.com
Thu Feb 28 01:36:54 CET 2013


On Wednesday, February 27, 2013 at 7:08 PM, PJ Eby wrote:
> On Wed, Feb 27, 2013 at 6:16 PM, Aaron Meurer <asmeurer at gmail.com (mailto:asmeurer at gmail.com)> wrote:
> > As far as I'm concerned, this is all about helping package
> > maintainers. The way pip works now, every time I do a release
> > candidate, pip automatically installs it, even though I only upload it
> > to Google Code. I don't want it to do this, but the only way around
> > it would be either 1. give it some weird name so that pip doesn't
> > think it is newer 2. upload it somewhere else or 3. go in to PyPI and
> > remove all mentions of Google Code from the index.
> > 
> 
> 
> There's also a *fourth* way, which I asked the PyPI developers many
> years ago to do, which is to stop including download links on the
> /simple index for "hidden" (i.e., non-current) releases.
> 
> (Something I am still in favor of, btw. Jim Fulton argued against it,
> IIRC, and it ended in a stalemate. However, I don't think we
> discussed distinguishing PyPI downloads from other downloads, just
> getting rid of old links in general)
> 
> Frankly, just dropping /simple links for hidden releases would also
> fix a good chunk of expired domain, stale releases, too many
> downloads, etc. In addition, if a project migrates to using PyPI
> uploads, they will not still be subject to external downloads for
> older versions being crawled.
> 
> So, if we must do away with the links, I would suggest that the phases be:
> 
> 1. Remove homepage/download URLs for "hidden" versions from the
> /simple index altogether (leaving PyPI download links available)
> 2. Remove the rel="..." attributes from the remaining download and
> home page links (this will stop off-site crawling, but not off-site
> downloading)
> 3. Re-evaluate whether anything else actually needs to be removed.
> 
> 

This seems a bit complicated, people in general don't even know
the external link spidering exists, much less understand the intricacies
of what types of links get spidered when. A simple "After X date no new
urls will be added and after Y date all existing urls will be removed" removes
ambiguity from the process. Having "this kind of link will get removed Y
and that matters in Z conditions" leads to a lot of confusion about
what does and doesn't work.
> 
> Basically, 99% of the complaints here are lumping together all of
> these different kinds of links -- stale links, spidered links, and
> plain external download links -- even though they don't create the
> same sorts of problems. Taking it in stages will give authors time to
> change processes, while still getting rid of the biggest problem
> sources right away (stale homepage/download URLs).
> 
> 

My complaints is external urls at all, for a myriad of reasons, some
specific to particular cases of them, some not. 
> 
> The first of these changes could be done now, though I'd check with
> Jim about the buildout use case; IIRC it was to allow pinned
> versions. But if the main use cases also had eggs on PyPI rather than
> downloading them from elsewhere, then removing *just* the
> homepage/download links would clean things up nicely, including your
> runaway Google Code downloads, without needing to change any installer
> code that's out in the field right now.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130227/22155cde/attachment-0001.html>


More information about the Catalog-SIG mailing list