[Catalog-sig] Suggested change to /simple index

P.J. Eby pje at telecommunity.com
Thu Jul 29 19:51:27 CEST 2010


Recently, a proposal was made to change the sorting of links on 
PyPI's /simple  index to prevent problems with easy_install finding 
out-of-date non-PyPI download links.  That proposal, unfortunately, 
would not have solved the actual problem.

After giving it some thought, I have an alternative proposal, that I 
think *would* solve the problem, and work for all scraping tools 
using the /simple index, not just easy_install.

Essentially, the problem is that when links to "hidden" versions were 
added to the /simple index (to satisfy users wanting to be able to 
download older versions' distributions), in-description and 
home/download page links were included.  However, if a package's home 
page URL or revision control download links change over time, the 
older ones still show up in the /simple listing, leading to ambiguity 
for download tools.

However, since the actual use case for which this was added was only 
to support reaching specific older versions of a project, it isn't 
actually necessary to include links that aren't to downloadable files 
with a specific version number.

Say package Foo releases version 1.1, causing 1.0 to become 
hidden.  People still want to be able to download the 1.0's .tgz's or 
.rpm's or what-have-you's.  However, they do *not* still need to be 
able to access the project's older, now-defunct home page, or any of 
the extra links included in the older version's description.

It is these extraneous links that cause the problem, not the access 
to PyPI-hosted archives.

Now, it could be argued that if a project used its "download" or 
"home page" link (or even in-description links) to point to actual 
archives, and if that is the case, then older links would be lost by 
omitting such links for "hidden" versions.  However, if that's really 
a problem, it could be remedied by simply checking whether the URL 
contains a file extension, or a revision number, or something like that.

However, since the original request to access hidden versions was 
aimed squarely at PyPI-hosted downloads, the original use case could 
still be met simply by only including PyPI-hosted links for "hidden" 
releases, thereby insuring that other links are only shown for 
"current" versions -- i.e., ones that package authors would expect 
are the only versions whose home/download/description links would 
need to be kept up-to-date on.

Making such a change would immediately fix many problematic/ambiguous 
links in the /simple index, where out-of-date or no-longer available 
links are shown.  (It would also fix the security issue whereby 
someone acquiring a no-longer-in-service URL could link it to trojan downloads.)



More information about the Catalog-SIG mailing list