[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Fri Jun 18 23:47:00 CEST 2010


On 09:39 pm, ziade.tarek at gmail.com wrote:
>On Thu, Jun 17, 2010 at 6:30 AM, Ian Bicking <ianb at colorstudy.com> 
>wrote:
>>On Wed, Jun 16, 2010 at 1:37 PM, "Martin v. Löwis" 
>><martin at v.loewis.de>
>>wrote:
>>>>
>>>>It is likely that some people will setup a mirror and then "forget" 
>>>>to
>>>>take care
>>>>about it. Like our buildbots really.
>>>
>>>
>>>The same can happen to any infrastructure, though. Amazon may decide 
>>>to
>>>change the setup, and then the automated update procedure would 
>>>break.
>>>Of course, they would give advance notice - but then somebody would
>>>have to react to that advance notice.
>>
>>That's not very likely, and if something does change it will be 
>>extremely
>>well announced and documented.  Amazon is providing a commercial 
>>service
>>lots of people rely on, their process is formalized and 
>>professionalized.
>>And if Amazon makes mistakes they'll figure out how to avoid them next 
>>time,
>>while mirror providers are a rotating crew that is unlikely to easily 
>>or
>>reliably learn from past mistakes.
>
>if a mirror manager don't do a good job, he'll just be taken out of
>the ring after a while.
>If we depend 100% on Amazon, and if there's a problem, the mirroring
>will be down for the time being and we won't be able to do nothing
>about it.
>>If we actually understood each time PyPI
>>broke and fixed it none of this would be a problem; I'm not blaming 
>>anyone
>>for that, but it's also not going to change and adding lots of mirror
>>systems just adds more systems with exactly the same management 
>>problems
>>that our current system has.
>
>Yes but the difference is that you don't put all your eggs in the same 
>basket:
>it's very unlikely that ALL community mirrors will be down at the same
>time, thus
>a fall-back mechanism on the client side will raise the availability
>automatically.
>
>About Amazon: what will happen in 5 years with their offer ? will our
>Cloud-PyPI infrastructure will still work ?  what will be the workload
>to maintain it ? You can't
>be 100% sure the Python community will be able to dedicate that time.
>PyPI works today because it is not forced by a third party to evolve,
>it can evolve as its own pace.
>
>On the contrary, once the mirrors system is set, it will be dead easy
>to add/remove a mirror in the ring, and each node won't act as a SPOF
>
>IMHO it's a bad idea to make this piece of our infrastructure depend
>on one third party commercial entity, where we can provide a community
>answer.

There are (multiple!) open source implementations of the Amazon API.  If 
Amazon decides to discontinue their cloud services (something I doubt 
should really be one of the top ten concerns here), then anyone else can 
set up their own cloud with the same interface.

If I were going to run a PyPI mirroring service, I'd probably want to do 
it this way *anyway* because managing virtual machines is far easier 
than managing actual hardware.

So there are probably many other much more significant issues to be 
worrying about.

Jean-Paul


More information about the Catalog-SIG mailing list