[Catalog-sig] Migrating away from scanning home pages (was: Deprecate External Links)

Donald Stufft donald.stufft at gmail.com
Thu Feb 28 12:45:45 CET 2013


On Thursday, February 28, 2013 at 5:55 AM, M.-A. Lemburg wrote:
> I think we all agree that scanning arbitrary HTML pages
> for download links is not a good idea and we need to
> transition away from this towards a more reliable system.
> 
> Here's an approach that would work to start the transition
> while not breaking old tools (sketching here to describe the
> basic idea):
> 
> Limiting scans to download_url
> ------------------------------
> 
> Installers and similar tools preferably no longer scan the all
> links on the /simple/ index, but instead only look at
> the download links (which can be defined in the package
> meta data) for packages that don't host files on PyPI.
> 
> Going only one level deep
> -------------------------
> 
> If the download links point to a meta-file named
> "<packagename>-<version>-downloads.html#<sha256-hashvalue>",
> the installers download that file, check whether the
> hash value matches and if it does, scan the file in
> the same way they would parse the /simple/ index page of
> the package - think of the downloads.html file as a symlink
> to extend the search to an external location, but in a
> predefined and safe way.
> 
> Comments
> --------
> 
> * The creation of the downloads.html file is left to the
> package owner (we could have a tool to easily create it).
> 
> * Since the file would use the same format as the PyPI
> /simple/ index directory listing, installers would be
> able to verify the embedded hash values (and later
> GPG signatures) just as they do for files hosted directly
> on PyPI.
> 
> * The URL of the downloads.html file, together with the
> hash fragment, would be placed into the setup.py
> download_url variable. This is supported by all recent
> and not so recent Python versions.
> 
> * No changes to older Python versions of distutils are
> necessary to make this work, since the download_url
> field is a free form field.
> 
> * No changes to existing distutils meta data formats are
> necessary, since the download_url field has always
> been meant for download URLs.
> 
> * Installers would not need to learn about a new meta
> data format, because they already know how to parse
> PyPI style index listings.
> 
> * Installers would prefer the above approach for downloads,
> and warn users if they have to revert back to the old
> method of scanning all links.
> 
> * Installers could impose extra security requirements,
> such as only following HTTPS links and verifying
> all certificates.
> 
> * In a later phase of the transition we could have
> PyPI cache the referenced distribution files locally
> to improve reliability. This would turn the push
> strategy for uploading files to PyPI into a pull
> strategy for those packages and make things a lot
> easier to handle for package maintainers.
> 
I don't have time to respond to the rest right now, but this isn't doable
I don't think. The purpose of that legalese you pointed out is to make
it possible for PyPI to serve those files legally. We don't know if those
files are something PyPI is allowed to distribute so PyPI can't cache them.
> 
> What do you think ?
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Feb 28 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130228/4d3fb1c2/attachment-0001.html>


More information about the Catalog-SIG mailing list