[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

Tarek Ziadé ziade.tarek at gmail.com
Tue Jun 15 20:47:52 CEST 2010


On Tue, Jun 15, 2010 at 7:43 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Tarek Ziadé wrote:
>> On Tue, Jun 15, 2010 at 7:15 PM, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>>
>>> On 15 Jun, 2010, at 19:02, Tarek Ziadé wrote:
>>>
>>>> On Tue, Jun 15, 2010 at 6:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> Alexis Métaireau wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Firstly, as Tarek said in another thread, I'm afraid this kill the PEP381
>>>>>> about making a mirroring infrastructure.
>>>>>> Having a infrastructure hosted on a cloud platform may be confortable, and
>>>>>> probably needed to have a 24/7 running system, but
>>>>>> we need to take care of letting possible the creation of new public mirrors,
>>>>>> outside from the Amazon (or whatever) cloud infrastructure.
>>>>>
>>>>> The proposal doesn't prevent that. However, please note that
>>>>> setting up public mirrors not under PSF control has its own
>>>>> set of (legal) problems, which the PSF hosted cloud setup avoids.
>>>>
>>>> Mirrors already exists out there, so unless you ban them (which would
>>>> be a really bad idea)
>>>> setting up a cloud will not fix any legal issue if you think there's a
>>>> legal issue.
>>>>
>>>> In any case, you can't prevent people from creating mirrors even if you
>>>> would say its illegal. Moreover, having mirrors provided by the community
>>>> is way better than relying on one single entity (the PSF) for this.
>>>> (if we think "decentralized")
>>>
>>> Why is having community mirrors better than one managed by the PSF?
>>
>> Because it's not controlled anymore by one single entity. For example,
>> if something is broken in the system
>> and need a human intervention, and the sysadmin people are not
>> available, we get a downtime.
>
> I'm not sure I understand: if the PyPI server goes down, the
> data will still be readily available on Amazon S3 and Cloudfront
> caches - the cronjobs copy over the PyPI server content to S3
> and Cloudfront serves it up from there.
>
> And if Cloudfront or S3 goes down, client tools could still
> try to access the PyPI server. (I'll add a note about that to
> the proposal.)

This can't beat a distributed network of mirrors that are not
depending on a single provider like Amazon.

We have suffered from this at bitbucket.org as a matter of fact:
Amazon was having problems, so bitbucket was slow and sometimes
down.

If Bitbucket had back then a distributed network of mirrors hosted
at different providers, that wouldn't have happened.

What I have learned lately in this area is that a lot of cheap servers spreaded
all over the world in different datacenters is more reliable.

And we happen to have this network already: lots of people
will host a PyPI mirror as soon as it's easy to set one imho.

Regards
Tarek

-- 
Tarek Ziadé | http://ziade.org


More information about the Catalog-SIG mailing list