[Distutils] Fwd: The state of PyPI

Alex Clark aclark at aclark.net
Tue Sep 27 19:59:40 CEST 2011


On 9/27/11 1:25 PM, Tarek Ziadé wrote:
> On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton<jim at zope.com>  wrote:
>
>>>> I understand where you're coming from but, ..
>
>>> Sorry, I don't understand what you imply here.
>
>> I understand why you don't want to rely on a proprietary solution.
>
> But it's true that I don't want to rely on a proprietary solution.
> That's based on a good reason I think, mentioned at the end of this
> mail.
>
> ...
>>
>>> If you're saying that CloudFront is proven technology and that we
>>> should not worry about relying on them, then I think we can do better
>>> for the community to get locked-in for this, and continue to work on
>>> an open protocol where everyone can participate by providing a spare
>>> server.  But maybe that's just me ?
>>
>> It's nice to have a hobby. :)
>
> I think you've missed what we, bunch of hobbyists, did in the past two years
>
> + 5 community mirrors are up and running, collecting download stats
> that get merged
> + pip does work with the mirrors, and offer fallback options
>
> It's too bad you were not there to tell us we were wasting our time
> and how awesome CloudFront was ;)
>
> But at this point, the shortest road to a better PyPI is to add the
> mirroring support to other clients, pip showed the lead. And if
> zc.buildout uses Distribute, it should get this feature at some point.
>
> But having a CloudFront-based PyPI could also be interesting in
> parallel, I am not saying it's not. But the project is stalled, and
> has the defaults I've mentioned.
>
>> But I don't want to have to update buildout *just* because of an itch
>> to have a custom protocol.
>
> I kind of wonder how hard it would be to have a standalone pypi
> download client, ripped off from python 3.3's packaging, so you would
> not have to worry about this.
>
> And, well, you do not sound like you want to spend time in these
> matters in any case, so if someone brings a patch I hope you will not
> refuse it.
>
>>> But the use case is usually: PyPI is down, we fallback to a mirror. I
>>> don't think it's more complicated than this.
>>
>> I don't agree.  On multiple levels.  PYPI is often up but slow.
>
> That's an orthogonal issue :  any server can be slow.
>
> One better way to drastically speed up buildout is to  download /
> build stuff in parallel imo.
>
>
>> It's also in the wrong place.  A CDN should provide better performance,
>> reliability and locality.
>
> Locality is indeed important, and picking up the nearest server is great.
> Reliability is also solved by the mirrors.
>
>>
>> A client has to:
>>
>> - try pypi
>> - fallback to "last"
>> - If that's down, decide what other indexes to check
>>
>> I don't see how having timestamps help unless you know
>> what the current timestamp is, unless you say that you'll reject
>> a mirror with a timestamp more than some period in the past.
>
> How hard it is to make those decisions ?
>
> Do you really think getting the current timestamp is that hard ?
>
> And the mirror timestamp,
>
>    http://b.pypi.python.org/last-modified
>
> In all you've said I fail to see how complicated it is, or long to do.
>
> The ordering I see is:
>
> normal behavior:
> - if the cache is too old: get the list of mirrors  (->  the list of
> mirrors and their timestamps get cached)
> - pick the closest one
> - use it
>
> the server times out:
> - try the "next closest"
>
>
>> It's not clear what this time delta should be and, in any case,
>> the client needs to first validate a mirror by checking it's timestamp.
>
> This is the job of the client yes. An option that says, discard
> mirrors that are>  1 day, or 5 hours etc.
>
> Keeping a local cache that gets updated eventually is sufficient.
>
>> I think this protocol is going to be hard to get right.
>
> Maybe ? but if a v1 allows us to switch from server 1 being down to
> server 2, it's already a success, no ?
>
> servers that *we* the community, manage.
>
>
>
>>>>
>>>> - It either requires extra dns calls or relies to heavily on the last
>>>> mirror, which is probably likely
>>>>   to be the least reliable.
>>>
>>> Once you have the list, I don't think you require extra call.
>>>
>>> see http://hg.python.org/cpython/file/84280fac98b9/Lib/packaging/pypi/mirrors.py
>>
>> It has to make extra dns calls to resolve the other mirror names to ips.
>
> Yeah, once per session. but in any case, this is not a decision you're
> making on every download. It's something you do when you start to
> download stuff, and/or when a server times out.
>
> You stick with a server once it's working
>
>>
>>
>>>> Life is short. We don't have to invent this ourselves.
>>>
>>> Ah well, yeah -- Not sure what you are proposing right now.
>>>
>>> If you imply that everything should be solved on server-side, and that
>>> we should not have mirroring
>>
>> I think we should pick a good CDN and use it.
>
> I won't object, because this is orthogonal to the mirroring stuff, but
> I am not going to scratch the mirroring efforts to move PyPI to a
> single shop.
>
> Every service on the planet, even Amazon, can be down.
>
> oh, my:
>
> - https://forums.aws.amazon.com/message.jspa?messageID=244986
> - http://money.cnn.com/2011/04/22/technology/amazon_ec2_cloud_outage/index.htm.
> - http://www.labnol.org/internet/amazon-s3-cloudfront-down/5667/
> - https://forums.aws.amazon.com/message.jspa?messageID=134012
> - https://forums.aws.amazon.com/message.jspa?messageID=177654
>
> Do we really want Amazon to handle PyPI ?
>
> I prefer a bunch of community mirrors. Heck, I have one at Mozilla,
> and might make it public one day  :)
>
> Or maybe the optimal solution is our own CND proxy so we don't deal
> with this on client side.
>
> <music in the background with trumpets, a flag with the Python logo
> raises, slowly>
>
> But in any case, I'd rather have a Pythoneer from our community behind
> every mirror server, I can trust
>
> </music>


Assuming Jim doesn't mind someone else adding support to zc.buildout for 
this, I wouldn't mind doing the work. I think stranger things have 
happened than we implement some feature not everyone agrees on, as long 
as it is not going to affect any other aspect of the software (which it 
wouldn't, IIUC).



Alex





>
>
> Cheers
> Tarek
>


-- 
Alex Clark · http://aclark.net



More information about the Distutils-SIG mailing list