[Catalog-sig] Deprecate External Links

holger krekel holger at merlinux.eu
Wed Feb 27 18:22:14 CET 2013


On Wed, Feb 27, 2013 at 10:26 -0500, Donald Stufft wrote:
> PyPI is now being served with a valid SSL certificate, and the
> tooling has begun to incorporate SSL verification of PyPI into
> the process. This is _excellent_ and the parties involved should
> all be thanked. However there is still another massive area of
> insecurity within the packaging tool chain.
> 
> For those who don't know, when you attempt to install a particular
> package a number of urls are visited. The steps look roughly
> something like this:
> 
>     1. Visit http://pypi.python.org/simple/Package/ and attempt to
>         collect any links that look like it's installable (tarballs,
>         #egg=, etc).
>         Note: /simple/Package/ contains download_url, home_page,
>         and any link that is contained in the long_description).
>     2. Visit any link referenced as home_page and attempt to
>         collect any links that look like it's installable.
>     3. Visit any link referenced in a dependency_links and attempt
>         to collect any links that look like it's installable.
>     4. Take all of the collected links and determine which one
>         best matches the requirement spec given and download it.
>     5. Rinse and repeat for every dependency in the requirement
>         set.    
> 
> I propose we deprecate the external links that PyPI has published
> on the /simple/ indexes which exist because of the history of PyPI.
> Ideally in some number of months (1? 2?) we would turn off adding
> these links from new releases, leaving the existing ones intact and
> then a few months later the existing links be removed completely.
> 
> Reasoning:
>   1. It is difficult to secure the process of spidering external links
>     for download.
>     1a. The only way I can think offhand is by requiring uploading
>           a hash of the expected files to PyPI along with the download
>           link and removing all urls except for the download_url. This
>           has the effect that only 1 file can be associated with a particular
>           release.

The main means of securing against tampering is author-signatures
and verification by installers.  If we have that, the download location
does not matter (pypi/CDN/google/...).  

>   2. External links decrease the expected uptime for a particular set
>       of requirements. PyPI itself has become very stable, however
>       the same cannot be said for all of the hosts linked that the toolchain
>       processes. Each new host is an additional SPOF.
>
>       Ex: I depend on PyPI and 10 other external packages, each
>             service has a 99% uptime so my expected uptime to
>             be able to install all my requirements would be ~89% (0.99 ** 11).

There are many links which go to google, bitbucket or github -
i doubt those services have worse availability than pypi.python.org,
rather better.

Also we would be loosing a lot of packages because i expect there to
be a non-trivial amount of packages which will not be transferred to 
pypi.python.org no matter how much people here think it's cool.

Why not first have an a good infrastructure and capacity with
pypi.python.org so that people *want* to move their files there?

best,
holger


>   3. Breaks the ability for a CDN and/or mirroring infrastructure to provide
>       increased uptime and better latency/throughput across the globe.
>   4. Privacy implications, as a user it is not particularly obvious when
>       I run `pip install Foo` what hosts I will be able issuing requests against.
>       It is obvious that I will be contacting PyPI and I will have made the
>       decision to trust PyPI however it is not obvious what other hosts will
>       be able to gather information about me, including what packages I am
>       installing. This becomes even more difficult to determine the deeper
>       my dependency tree goes.


> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig



More information about the Catalog-SIG mailing list