[Catalog-sig] Rewrite PyPI for App Engine?

Sat Jun 19 01:58:00 CEST 2010

On Fri, Jun 18, 2010 at 6:27 PM, Tarek Ziadé <ziade.tarek at gmail.com> wrote:

> On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking <ianb at colorstudy.com> wrote:
> > With all the reliability discussion, I thought I'd offer a kind of
> > counterproposal, that we rewrite PyPI to use App Engine.
> >
> > Of course, this means writing code, etc., but I believe this is a
> reasonable
> > goal.  I think if "we" (Catalog-SIG?  PyPI maintainers?) committed to
> using
> > such an implementation (assuming it was of good quality) that we could
> find
> > people (probably not on this list) to write and maintain the code.
> People
> > have already rewritten PyPI a couple times, but no one knows what exactly
> to
> > *do* with the rewrite so they haven't gone anywhere.  And PyPI is not a
> > particularly complicated application.  I think we can set the bar high on
> > the implementation quality and that people will meet it, so long as they
> > know their effort won't be in vain.
>
> Out of curiosity : have you ever worked with the current implementation ?
>
> I have hard time to understand why some people say it's hard to work with
> it,
> I don't think its a valid argument.
>

I haven't looked at it in years, but I've poked around it some.  I found it
difficult, yes.

> > Why App Engine?  The primary reason I'm proposing it is because it will
> be
> > much easier to manage.  If it runs out of memory it won't bring down a
> > machine.  If new people maintain the system it's easy to describe how to
> do
> > deployments, for instance.  It's easy for people to install their own
> PyPI
> > instances for development and to generate patches.  Hosted services can
> have
> > downtimes of course, but unlike currently there are other people (the App
> > Engine maintainers) who will resolve those problems.  There's still a
> class
> > of bugs like badly indexed tables or weird locking issues that could
> bring
> > PyPI down and "we" would have to fix it, and with a rewrite there's more
> of
> > a risk of that, but... it'll just take some testing to make sure things
> are
> > okay.
> >
> > In terms of cost, I expect we can get free hosting, and packages can be
> > stored directly in the data store.  That doesn't preclude using a CDN
> like
> > CloudFront, but that can be handled separately.  Also since the index
> just
> > links to packages, packages can be incrementally uploaded to a CDN.
>
> Even if I don't think its a priority in our concerns (community
> mirrors come first), I wouldn't mind having the main PyPI UI in GAE.
>

The priorities that motivate me are:

1. Make installation more reliable with respect to PyPI
2. Decrease overall maintenance burden
3. Decrease code liability

Community mirrors only address 1 while App Engine addresses 2 and a rewrite
addresses 3.  And I think App Engine would be significantly more reliable
than PyPI with mirrors.  It's less moving parts, and it's built on
infrastructure that is highly automated.  Also because it requires less
maintenance, if someone drops out of communication for a while or goes on
vacation or something, it's not something that needs active tending.

There's a significant number of failure conditions that a mirror network
doesn't protect you from.  Connection refused, connection timed out, and 500
errors are the only really obvious errors that will make a tool look to the
next mirror.  Because of potential synchronization problems there's a lot of
new problems a mirror network could introduce.

 Although, if PyPI was to be ported to GAE, couldn't we reuse the
> existing code instead of rewriting from scratch ? we would just have
> to rewrite the DB layer.
>

I don't think it's worth reusing that code.

 > Besides a commitment to using the code (which I think is really important
> to
> > motivate people), a scrubbed dump of the database would be really helpful
> > for development.  I know we've passed around complete dumps to people,
> but
> > it contains private information so we can't put it up publicly which
> creates
> > a speed bump for developers.
>
> Private information could be easily removed from those dumps;
>
> But I don't think it's so helpful since you have all the .sql scripts to
> create
> your own DB. But we could add a script to create some sample data on
> the top of those scripts. <http://ziade.org>
>

It's useful to have a representative data set to test with, especially for
stuff like performance testing.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20100618/42d95f40/attachment.html>