[Catalog-sig] Deprecate External Links

Aaron Meurer asmeurer at gmail.com
Wed Feb 27 18:52:16 CET 2013


On Feb 27, 2013, at 10:22 AM, holger krekel <holger at merlinux.eu> wrote:

> On Wed, Feb 27, 2013 at 10:26 -0500, Donald Stufft wrote:
>> PyPI is now being served with a valid SSL certificate, and the
>> tooling has begun to incorporate SSL verification of PyPI into
>> the process. This is _excellent_ and the parties involved should
>> all be thanked. However there is still another massive area of
>> insecurity within the packaging tool chain.
>>
>> For those who don't know, when you attempt to install a particular
>> package a number of urls are visited. The steps look roughly
>> something like this:
>>
>>    1. Visit http://pypi.python.org/simple/Package/ and attempt to
>>        collect any links that look like it's installable (tarballs,
>>        #egg=, etc).
>>        Note: /simple/Package/ contains download_url, home_page,
>>        and any link that is contained in the long_description).
>>    2. Visit any link referenced as home_page and attempt to
>>        collect any links that look like it's installable.
>>    3. Visit any link referenced in a dependency_links and attempt
>>        to collect any links that look like it's installable.
>>    4. Take all of the collected links and determine which one
>>        best matches the requirement spec given and download it.
>>    5. Rinse and repeat for every dependency in the requirement
>>        set.
>>
>> I propose we deprecate the external links that PyPI has published
>> on the /simple/ indexes which exist because of the history of PyPI.
>> Ideally in some number of months (1? 2?) we would turn off adding
>> these links from new releases, leaving the existing ones intact and
>> then a few months later the existing links be removed completely.
>>
>> Reasoning:
>>  1. It is difficult to secure the process of spidering external links
>>    for download.
>>    1a. The only way I can think offhand is by requiring uploading
>>          a hash of the expected files to PyPI along with the download
>>          link and removing all urls except for the download_url. This
>>          has the effect that only 1 file can be associated with a particular
>>          release.
>
> The main means of securing against tampering is author-signatures
> and verification by installers.  If we have that, the download location
> does not matter (pypi/CDN/google/...).
>
>>  2. External links decrease the expected uptime for a particular set
>>      of requirements. PyPI itself has become very stable, however
>>      the same cannot be said for all of the hosts linked that the toolchain
>>      processes. Each new host is an additional SPOF.
>>
>>      Ex: I depend on PyPI and 10 other external packages, each
>>            service has a 99% uptime so my expected uptime to
>>            be able to install all my requirements would be ~89% (0.99 ** 11).
>
> There are many links which go to google, bitbucket or github -
> i doubt those services have worse availability than pypi.python.org,
> rather better.
>
> Also we would be loosing a lot of packages because i expect there to
> be a non-trivial amount of packages which will not be transferred to
> pypi.python.org no matter how much people here think it's cool.
>
> Why not first have an a good infrastructure and capacity with
> pypi.python.org so that people *want* to move their files there?

If you change the policy to also download links, but only official
links actually manually put there by the package maintainer, no
crawling, isn't it fair to say, "if you want pip to install your
package, you need to tell PyPI where it is, explicitly. And if you
release a new version, you need to tell PyPI about that new version,
or else it will continue to install the old version."  I suppose they
could also just have a link to "latest tarball" if they really want to
be lazy.

PyPI/pip are not like Linux package systems. They should have no
prerogative to always try to get the latest version without any work
by the package maintainer, especially since there's not a team of
people who do it: the whole thing happens automatically by some
heuristics.

Aaron Meurer

>
> best,
> holger
>
>
>>  3. Breaks the ability for a CDN and/or mirroring infrastructure to provide
>>      increased uptime and better latency/throughput across the globe.
>>  4. Privacy implications, as a user it is not particularly obvious when
>>      I run `pip install Foo` what hosts I will be able issuing requests against.
>>      It is obvious that I will be contacting PyPI and I will have made the
>>      decision to trust PyPI however it is not obvious what other hosts will
>>      be able to gather information about me, including what packages I am
>>      installing. This becomes even more difficult to determine the deeper
>>      my dependency tree goes.
>
>
>> _______________________________________________
>> Catalog-SIG mailing list
>> Catalog-SIG at python.org
>> http://mail.python.org/mailman/listinfo/catalog-sig
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


More information about the Catalog-SIG mailing list